1 Rationale and Objectives

This document details the modeling workflow implemented for estimating dissolved and equilibrium N2O concentrations and saturation ratios using the 2017 Nation Lakes Assessment (NLA) survey data. The NLA sampling sites were distributed among the target population of US lakes (in the lower 48 states) according to a probabilistic survey design with samples stratified among categories of lake surface area, WSA9 ecoregion, and US state (excluding AK and HI). Due to the stratification scheme, some types of lakes in the sample population were intentionally over-represented (e.g., large lakes) and some were under-represented (e.g., small lakes) relative to the target population. Due to the unequal probability design, inferences from the sample had to be adjusted for inferences on the broader populations of interest (e.g, National-, state-, ecoregion-, and size class-specific estimates).

The concept of the “complete data likelihood” is useful for conceptualizing biases arising from sampling design (Zachmann et al. 2022; Gelman et al. 2014 Ch. 8; Link and Barker 2010). For the NLA survey data, the population of US lakes in the lower 48 states larger than 4 hectares was considered the complete data and the probabilistic samples were considered a subset of that complete data. The portion of US lakes not included in the sample were considered “missing” from the complete data not at random, but conditional on the pre-specified design (stratification) variables. This non-random missingness was not ignorable for the purpose of making inferences from the sample to the target population. In a model-based framework, however, including the design parameters as predictors in a regression model is one way to adjust for the missingness. For a thorough and recent treatment of this concept in the context of national surveys of environmental resources, refer to (Zachmann et al. 2022). This concept is a key motivator for the increasingly popular mulitilevel regression with poststratification (MRP) approach to model-based inference [Gelman et al. (2014); Gelmant_etal_2020 Ch. 17].

The following workflow illustrates our model-based approach, based largely on the logic of MRP, but with an elaboration on the poststratification step to enable eventual estimates of total gas flux at the population level, which required scaling up from lake-level estimates. The typical MRP process is carried out in two steps. The first step is to fit regression models for the response variables of interest (e.g., dissolved N2O, equilibrium N2O) conditional on the survey design variables ┼(i.e., ecoregion, state, lake size). The second step is post-stratfication, wherein the posterior parameter estimates from the regression model for the sample population are weighted based on their known or assumed distribution in the population of interest [i.e., post-stratification table; Gelman, Hill, and Vehtari (2020) Ch. 17]. The poststratification table in our case, for example, would be a population summary of lakes among the design variables: ecoregion, state, and size category. However, because we eventually needed lake-level estmates, instead of predicting to a postratification table, we predicted to each individual lake in the population of interest. This meant predicting to the full target population of 465,897 natural and man made US lakes larger than 4 hectares in the lower 48 states. These predictions were assumed relevant to average conditions during the “index period” for each lake in 2017. Details about the sampling frame as well as the target population are further clarified in the workbook below with data summaries and code.

For the regressions, we used multilevel models fit in a fully Bayesian fashion Multilevel models are thought to work well in this context because they provide regularized estimates along the design groupings, which can improve out-of-sample inferences (McElreath 2020). Inferences for lake types that may be missing from the sample, but are part of the population of interest are also straightforward using this approach (Gelman, Hill, and Vehtari 2020 Ch. 17; McElreath 2020). More information these models, their specific parameters, R code, fit evaluations, and resulting inferences are presented in this document.

The overriding objective of the modeling effort was to provide population level estimates for (1) dissolved and equilibrium N2O concentrations; (2) the N2O saturation ratio (i.e., dissolved N2O/equilibrium N2O); and (3) the proportion of under-saturated water bodies (i.e., saturation ratio < 1). The estimates would also be used to later estimate the total flux of N2O gas attributable to the target population of lakes over the index period. The saturation ratio estimates were calculated as a derived quantity based on the ratio of modeled dissolved to equilibrium N2O. Because dissolved and equilibrium N2O were observed on the same sample units (lake sites), we developed models for estimating their joint distribution. The response variable in the models was, therefore, multivariate to account for potential statistical dependencies between dissolved and equilibrium N2O due to, for example, common dependencies on geography. Although point predictions of the mean marginal probabilities from separate models could be comparable, a joint model allowing correlated observation-level errors (i.e., residuals) was expected to better capture uncertainty and potentially improve out-of-sample predictions, should the variables be conditionally correlated (Warton et al. 2015; Poggiato et al. 2021). All of the models fit were constructed using the brms package (Bürkner 2017) in R (R Core Team 2021) as an interface to Stan, a software package for fitting fully Bayesian models via Hamiltonian Monte Carlo [HMC; Team (2018b); Team (2018c); Team (2018a)].

2 Data

As explained in a previous data munging document document (https://github.com/USEPA/DissolvedGasNla/blob/master/scripts/dgIndicatorAnalysis.html), duplicate dissolved gas samples were collected at a depth of ~0.1m at designated index sites distributed across 1091 lakes nationwide, of which 95 were sampled twice as repeat visits. This randomly selected subset of revisit sites was used as a test set for assessing model fit and out-of-sample performance.

Gas samples were analyzed via gas chromotography and concentrations were recorded to the nearest 0.001 nmol/L. The samples were collected under a stratified, unequal probability design and each gas observation was indexed to an individual lake selected with unequal probability from 5 different lake size categories, \(j \in j=1,...,J = 5\), according to surface area (ha), and from within a state, \(k \in k=1,...,K = 48\), situated within an aggregated, WSA9 or Omernik ecoregion, \(l \in l=1,...,L = 9\). All 9 WSA9 ecoregions were represented in the sample, including Xeric (XER), Western Mountain (WMT), Northern Plains (NPL), Southern Plains (SPL), Temperate Plains (TPL), Coastal Plains (CPL), Upper Midwest (UMW), Northern Appalachian (NAP), and Southern Appalachian (SAP) regions. As shown below, the data from the initial and revisit samples were separately compiled into data frame objects in \(\textbf{R}\), with \(n=984\) and \(n=95\) rows, respectively, of gas observations indexed to the survey design variables and several potentially relevant covariates.

2.1 Import

The gas data and covariates were previously described and munged at https://github.com/USEPA/DissolvedGasNla/blob/master/scripts/dataMunge.html. That dataset was imported below.

load( file = paste0( localPath,
              "/Environmental Protection Agency (EPA)/",
              "ORD NLA17 Dissolved Gas - Documents/",
              "inputData/dg.2021-02-01.RData")
      )

save(dg, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/dg.rda") 

From the imported dataset, a new data frame for modeling was constructed from the original file including only the variables of interest: (1) the N2O gas observations; (2) the survey design variables indexed to those observations; and (3) additional covariates considered potentially useful for improving the fit of the model. The data frame below excluded the second-visit observations, which would later be used for model checking. Some variables from the imported data were renamed for convenience. In addition, the NO3 covariate was rounded according to the documented measurement precision. An alternative version of the NO3 covariate was also created in this step by log-transforming and re-coding it as an ordered factor with five levels at hand-drawn cut points. The left-most cut point separated observations below the detection limit from the completely observed samples. The remaining cut points in the positive direction were drawn at approximately equal distances along the log scale. Finally, it should be noted that one lake that was sampled was missing information on the N2O gas measurements and it was removed from the data frame.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/dg.rda")

dg %>%
  filter(!is.na(dissolved.n2o.nmol)) %>% # 1 obs with missing measurement
  nrow() # number of observations before filtering
[1] 1185
df_model <- dg %>%
  filter(!is.na(dissolved.n2o.nmol)) %>%
  filter(sitetype == "PROB") %>% # probability samples only
  filter(visit.no == 1) %>%
  mutate(n2o = round(dissolved.n2o.nmol, 2),
         n2o_eq = round(sat.n2o.nmol, 2),
         n2o_sat = n2o.sat.ratio,
         n2o_em = e.n2o.nmol.d,
         n2o_flux = f.n2o.m.d,
         WSA9 = factor(ag.eco9),
         state = factor(state.abb[match(state.nm, state.name)]),
         area_ha = area.ha,
         log_area = log(area_ha),
         chla = chla.result,
         log_chla = log(chla),
         elev = elevation,
         log_elev = log(elev + 1),
         do_surf = o2.surf,
         log_do = log(do_surf),
         bf_max = max.bf,
         sqrt_bf = sqrt(bf_max),
         size_cat = recode(area.cat6, 
                           "(1,4]" = "min_4" ,
                           "(10,20]" = "10_20",
                           "(20,50]" = "20_50",
                           "(4,10]" = "4_10",
                           ">50" = "50_max")) %>%
  mutate(size_cat = factor(size_cat,
                           levels = c("min_4", "4_10", "10_20", "20_50", "50_max"),
                           ordered = TRUE)) %>%
  mutate(no3 = ifelse(nitrate.n.result <= 0.0005, 0.0005, round(nitrate.n.result, 4))) %>%# 1/2 mdl 0.01
  mutate(no3_cat = cut(log(no3), # convert no3 to ordered factor with 5 levels
                       breaks = c(-Inf, -7.5, -5.5, -3.5, -1.5, Inf),
                       labels =seq(1, 5, 1))) %>%
  mutate(no3_cat = factor(no3_cat,
                          levels = seq(1, 5, 1),
                          ordered = TRUE)) %>%
  mutate(date = as.Date(date.col)) %>%
  mutate(jdate = as.numeric(format(date, "%j"))) %>% 
  mutate(lat = map.lat.dd,
         lon = map.lon.dd) %>% # longitude
  mutate(surftemp = surftemp,
         log_surftemp = log(surftemp)) %>% 
  select(WSA9,
         state,
         size_cat,
         site.id,
         lat,
         lon,
         date,
         jdate,
         surftemp,
         log_surftemp,
         area_ha,
         log_area,
         elev,
         log_elev,
         chla,
         log_chla,
         do_surf,
         log_do,
         bf_max,
         sqrt_bf,
         n2o,
         n2o_eq,
         no3,
         no3_cat
         )

save(df_model, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda") 

nrow(df_model) # number of obs after filtering
[1] 984
print(df_model)

A second dataframe, including only the second visit observations, was constructed below. These data were later used as a “test set” to assess the out-of-sample fit of the model developed on the first-visit or training data.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/dg.rda")

# number of observations before filtering probability samples
dg %>%
  filter(!is.na(dissolved.n2o.nmol)) %>% # remove obs with missing response measurements
  nrow()
[1] 1185
df_test <- dg %>%
  filter(!is.na(dissolved.n2o.nmol)) %>%
  filter(sitetype == "PROB") %>% # probability samples only
  filter(visit.no == 2) %>%
  mutate(n2o = round(dissolved.n2o.nmol, 2),
         n2o_eq = round(sat.n2o.nmol, 2),
         n2o_sat = n2o.sat.ratio,
         n2o_em = e.n2o.nmol.d,
         n2o_flux = f.n2o.m.d,
         WSA9 = factor(ag.eco9),
         state = factor(state.abb[match(state.nm, state.name)]),
         area_ha = area.ha,
         log_area = log(area_ha),
         chla = chla.result,
         log_chla = log(chla),
         elev = elevation,
         log_elev = log(elev + 1),
         do_surf = o2.surf,
         log_do = log(do_surf),
         bf_max = max.bf,
         sqrt_bf = sqrt(bf_max),
         size_cat = recode(area.cat6, 
                           "(1,4]" = "min_4" ,
                           "(10,20]" = "10_20",
                           "(20,50]" = "20_50",
                           "(4,10]" = "4_10",
                           ">50" = "50_max")) %>%
  mutate(size_cat = factor(size_cat,
                           levels = c("min_4", "4_10", "10_20", "20_50", "50_max"),
                           ordered = TRUE)) %>%
  mutate(no3 = ifelse(nitrate.n.result <= 0.0005, 0.0005, round(nitrate.n.result, 4))) %>%# 1/2 mdl 0.01
  mutate(no3_cat = cut(log(no3), # convert no3 to ordered factor with 5 levels
                       breaks = c(-Inf, -7.5, -5.5, -3.5, -1.5, Inf),
                       labels =seq(1, 5, 1))) %>%
  mutate(no3_cat = factor(no3_cat,
                          levels = seq(1, 5, 1),
                          ordered = TRUE)) %>%
  mutate(date = as.Date(date.col)) %>%
  mutate(jdate = as.numeric(format(date, "%j"))) %>% 
  mutate(lat = map.lat.dd,
         lon = map.lon.dd) %>% # longitude
  mutate(surftemp = surftemp,
         log_surftemp = log(surftemp)) %>% 
  select(WSA9,
         state,
         size_cat,
         site.id,
         lat,
         lon,
         date,
         jdate,
         surftemp,
         log_surftemp,
         area_ha,
         log_area,
         elev,
         log_elev,
         chla,
         log_chla,
         do_surf,
         log_do,
         bf_max,
         sqrt_bf,
         n2o,
         n2o_eq,
         no3,
         no3_cat
         )

save(df_test, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_test.rda") 

nrow(df_test) # number of obs after filtering for probability samples, first visits, and removing one site missing ecoregion (WSA9) info.
[1] 95
print(df_test)

2.2 Target population

Below. the NLA sampling frame was imported and then filtered to include only the target population or sampling frame for this project.

df_pop <- read.csv(file = paste0(localPath,
              "/Environmental Protection Agency (EPA)/",
              "ORD NLA17 Dissolved Gas - Documents/",
              "inputData/NLA_Sample_Frame.csv"), header = T)

sframe <- df_pop %>%
  filter(nla17_sf != "Exclude2017") %>%
  filter(nla17_sf != "Exclude2017_Include2017NH") %>%
  filter(state != "DC") %>%
  filter(state != "HI") %>%
  droplevels() %>%
  mutate(WSA9 = factor(ag_eco9),
         WSA9 = forcats::fct_drop(WSA9), # remove NA level
         state = factor(state),
         size_cat = factor(area_cat6),
         lat = lat_dd83,
         lon = lon_dd83,
         log_area = log(area_ha),
         elev = elevation,
         log_elev = ifelse(elev <= 0, 0, elev), # assumed elev < 0 to be elev = 0
         log_elev = log(log_elev + 1)
         ) %>% 
  mutate(size_cat = recode(size_cat, 
                           "(1,4]" = "min_4" ,
                           "(10,20]" = "10_20",
                           "(20,50]" = "20_50",
                           "(4,10]" = "4_10",
                           ">50" = "50_max")) %>%
  mutate(size_cat = factor(size_cat, 
                           levels = c("min_4", "4_10", "10_20", "20_50", "50_max"),
                           ordered = TRUE)) %>%
  select(WSA9, state, size_cat, lat, lon, area_ha, log_area, elev, log_elev)

rm(df_pop)

save(sframe, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda") 

print(sframe)

The resulting target population above included a total of 465,897 waterbodies.

Cross tabulations below describe the structure of the target population with respect to the design variables. The cross-tabulation makes it clear that each ecoregion does not contain each state. Therefore, in the statistical sense, states were nested in ecoregions.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda")

sframe %>%
  group_by(WSA9, state) %>%
  summarise(n = n(), .groups = "drop") %>%
  spread(state, n) %>%
  print()

Likewise, lake size category was nested in state (which was nested in ecoregion). That is, not every ecoregion:state in the population of interest contained every size category (below).

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda")

sframe %>%
  group_by(WSA9, state, size_cat) %>%
  summarise(n = n(), .groups = "drop") %>%
  spread(size_cat, n) %>%
  print()

Below, the sampling frame was selected down to create a post-stratification table. Some of the variables were renamed to match the naming conventions used in the observational data above. There were 536 types of lakes in the population of interest with respect to the sampling design. The counts of those lake types (n_lakes) and their proportions relative to the total population of lakes in the sampling frame (prop_cell) are indicated below.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda")

pframe <- sframe %>%
  mutate(obs = 1) %>%
  group_by(WSA9, state, size_cat) %>%
  summarise(n_lakes = sum(obs), .groups = "drop") %>%
  ungroup() %>%
  mutate(prop_cell = n_lakes/sum(n_lakes)) %>%
  mutate(type = "population") 

save(pframe, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe.rda")

print(pframe)

2.3 Sample vs. population

Below, the lake distributions in the population of interest were compared to the proportions in the observed sample. There were 352 lake types in the sample compared to the 536 in the population of of interest. In total, there were 984 observations distributed across these 352 lake types in the sample; and the number of samples was not distributed evenly across the types. Some cells were represented by as few as 1 lake. In total, 536-352 = 184 lake types in the population of interest were not represented in the sample.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

samp_props <- df_model %>%
  mutate(obs = 1) %>%
  group_by(WSA9, state, size_cat) %>%
  summarize(n_lakes = sum(obs), .groups = "drop") %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes / sum(n_lakes), 7)) %>%
  mutate(type = "sample") 

save(samp_props, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")

print(samp_props)

Below, a graphical comparison was constructed to depict the distribution of cells in the population of interest versus those in the sample.

Another comparison between population and sample was constructed below by ecoregion. The samples were not balanced across ecoregions. Lakes in the Coastal Plains (CPL) ecoregion, for example, were clearly undersampled relative to their proportion of the population.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe.rda")

pframe_eco <- pframe %>%
  group_by(WSA9) %>%
  summarise(n_lakes = sum(n_lakes)) %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
  ungroup() %>%
  mutate(type = 'population') 

save(pframe_eco, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe_eco.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")

samp_props_eco <- samp_props %>%
  group_by(WSA9) %>%
  summarise(n_lakes = sum(n_lakes)) %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
  ungroup() %>%
  mutate(type = 'sample')

save(samp_props_eco, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props_eco.rda")

A similar comparison by state was constructed below.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe.rda")

pframe_state <- pframe %>%
  group_by(state) %>%
  summarise(n_lakes = sum(n_lakes)) %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
  ungroup() %>%
  mutate(type = 'population')

save(pframe_state, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe_state.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")

samp_props_state <- samp_props %>%
  group_by(state) %>%
  summarise(n_lakes = sum(n_lakes)) %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
  ungroup() %>%
  mutate(type = 'sample')

save(samp_props_state, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props_state.rda")

Finally, a comparison by lake size category is shown below. Note that small lakes were under-sampled relative to larger lakes by design.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe.rda")

pframe_size <- pframe %>%
  group_by(size_cat) %>%
  summarise(n_lakes = sum(n_lakes)) %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
  ungroup() %>%
  mutate(type = 'population')

save(pframe_size, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe_size.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")

samp_props_size <- samp_props %>%
  group_by(size_cat) %>%
  summarise(n_lakes = sum(n_lakes)) %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
  ungroup() %>%
  mutate(type = 'sample')

save(samp_props_size, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props_size.rda")

2.4 Sample-based estimates

The overall mean and standard deviation for N2O in the sample:

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  summarise(mean = mean(n2o),
             sd = sd(n2o)) %>%
  print()

The same summary for equilibrium N2O:

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  summarise(mean = mean(n2o_eq),
             sd = sd(n2o_eq)) %>%
  print()

The saturation ratio (i.e., N2O / N2O-eq):

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  summarise(mean = mean(n2o / n2o_eq),
             sd = sd(n2o / n2o_eq)) %>%
  print()

Finally, roughly 67% of lakes in the sample were undersaturated (i.e., saturation ratio < 1):

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  summarise(prop_undersat = sum((n2o / n2o_eq) < 1) / 984) %>%
  print()

Using only the sample observations, a plot was constructed below of the overall mean (dashed line) along with the ecoregion-specific means (black circles). The shaded areas indicate +/- 1 standard deviation. Neither dissolved N2O nor the saturation ratio were clearly structured by ecoregion in the sample, but there did appear to be some structure in the equilibrium N2O observations.

The same summary by state is below.

Finally, the same summary by size category:

2.5 Sample data exploration

Below, the empirical distribution of N2O observations in the sample was summarized using a density and rug plot below. Note the natural log scale of the x axis. Both the N2O and equilibrium N2O data had considerable right skew even after the log transformation, which was not unexpected and has been noted in other studies (Webb et al. 2019). The saturation ratio was also skewed since it was derived from the other two observed variables (i.e., sat_ratio = n2o / n2o_eq).

Below are plots of N2O vs. NO3. The first plot shows log(N2O) vs. log(NO3), as well as the ordinal categories assigned to NO3 (vertical lines). The leftmost vertical line is dashed and separates the NO3 observations below the detection limit.

`geom_smooth()` using formula = 'y ~ x'

In the plot above, the trend is increasing and nonlinear on the log scale. The increasing variance in N2O along the NO3 gradeient suggested a potential mediator of the relationship between NO3 on N2O. Below are plots of N2O vs. NO3 for 6 quantiles of the surface temperature measurements (quantiles increasing from 1 to 6). This plot below suggested that the NO3 effect on N2O may have been stronger in lakes with higher observed temperatures.

`geom_smooth()` using formula = 'y ~ x'

The next plot below shows the relationship between N2O and NO3 at 6 different quantiles (increasing 1 to 6) of the log-scaled lake surface area estimates.

`geom_smooth()` using formula = 'y ~ x'

Similar plots are below, but with NO3 expressed as an ordered categorical variable with 5 levels. The positive and monotonic trends area similar to the previous plots where NO3 was treated as continuous. Note the large number of observations in the first NO3 category (no3_cat = 1). This category represented all of the censored observations for NO3, which was most of the data.

Below is a plot of log(N2O) vs. log(NO3) by ecoregion, which suggested that the NO3 effect on N2O may have varied by ecoregion.

`geom_smooth()` using formula = 'y ~ x'

Below is the same plot as above but for the ordered categorical version of NO3.

A plot below shows trends by state within just the Temperate Plains (TPL) ecoregion. Within states, the number of observations were relatively small, but the trends appeared closer to linear.

`geom_smooth()` using formula = 'y ~ x'

3 Model fitting

The first regression model was constructed to estimate the joint distribution of log-transformed N2O and equilibrium N2O conditional on the the design factors. Each log-transformed observation, \(i \in 1,..,N=984\), for each response, \(p \in 1:P=2\), was assumed to be drawn from a multivariate normal distribution with the parameters \(\nu\) and \(\Sigma\), where \(\nu\) is the multivariate mean estimated conditional on the design effects and \(\Sigma\) is a covariance matrix containing the observation-level variances and residual correlation: \[Y \sim MVN(\nu, \Sigma)\]

The multivariate mean is a vector of mean parameters, \(\nu:[\mu_{p=1}, \mu_{p=2}]\), for each response. Each mean is further defined by a linear combination of parameters where, for each response \(p\) and observation \(i\):

\[\mu_{pi} = \alpha_{0(pi)} + \alpha_{1(pij)} + \alpha_{2(pijk)} + \alpha_{3(pijkl)} \\ \alpha_1 \sim MVN(0, \Lambda_1) \\ \alpha_2 \sim MVN(0, \Lambda_2) \\ \alpha_3 \sim MVN(0, \Lambda_3)\]

The linear combination of parameters defining \(\mu\) above include a fixed global intercept, \(a_0\), that is estimated directly from the data, and three separate, latent group-level effects matrices, \(\alpha_1, \alpha_2, \alpha_3\). The group effects were assumed to be multivariate normal and are centered on zero in multivariate space. The spread of the effects around zero are determined by a covariance matrix, \(\Lambda_1, \Lambda_2, \text{or } \Lambda_3\), which are estimated directly from the data. These covariance terms are further defined where:

\[\Lambda = \begin{pmatrix} 1 & \tau_{p=1} \\ \tau_{p=2} & 1 \end{pmatrix} \chi \begin{pmatrix} 1 & \tau_{p=1} \\ \tau_{p=2} & 1 \end{pmatrix}\]

The \(\tau\) parameters are the group-level scale parameters, which constrain the spread of effects for each response, and \(\chi\) comprises the group-level residual correlation matrix:

\[\chi = \begin{pmatrix} 1 & \varrho \\ \varrho & 1 \end{pmatrix}\]

wherein \(\varrho\) is the group-level residual correlation between responses.

The explicit indexing in the notation above conveys the relationship between the parameters and each observation, \(i\), and emphasizes the nested structure of the observations within the group effects. Specifically, every observation, \(i\), was nested in a lake size category, \(l\), which was nested in a state, \(k\), and ecoregion, \(j\). The parameter \(\alpha_1\), therefore, accounted for ecoregion-scale group effects or deviations from the global mean; \(\alpha_2\) accounted for state-level group effects nested in ecoregions; and \(\alpha_3\) accounted for lake size group effects within states and ecoregions.

Finally, the observation-level covariance term, \(\Sigma\), was parameterized as: \[\Sigma = \begin{pmatrix} 1 & \sigma_{p=1} \\ \sigma_{p=2} & 1 \end{pmatrix} \Omega \begin{pmatrix} 1 & \sigma_{p=1} \\ \sigma_{p=2} & 1 \end{pmatrix}\]

wherein the \(\sigma\) parameters are the observation-level standard deviations for each response and \(\Omega\) comprises the observation-level residual correlation matrix: \[\Omega = \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}\] wherein \(\rho\) is the residual correlation between responses.

For model fitting, priors were needed for all parameters conditioned directly on the data, which included the global intercept, the scale parameters, and the correlation matrices. A normal or Gaussian prior, \(N(\mu = 2, \sigma = 1)\) centered near the (log-scale) data means, was used for the global intercept parameter for each response. This prior was considered minimally informative as it placed most (~80%) of the prior mass over values between approximately 2 and 27 ng/L for median N2O or N2O equilibrium concentration and included support in the tails for values approaching 0 ng/L on the lower end and 80 ng/L on the high end. We placed \(Exp(2)\) priors over all scale parameters, which placed most of the support between values very close to 0 and values near 1 (central 80% density interval from approximately 0.005 to 1.15). Finally, for the correlation matrices, an \(LKJ(\eta =2)\) prior was used, which, for a 2-dimensional response, placed most support for correlations between approximately -0.9 and 0.9. This prior seemed reasonable as there was no clear causal mechanisms that were thought to ensure a strong direct correlation between the N2O measures. Any potential residual dependence was expected to be indirect due to, for example, a common causal factor (e.g., elevation, temperature). For more information on prior choice recommendations in Stan, see: https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations

The \(\textbf{brms}\) package (Bürkner 2017) for \(\textbf{R}\) (R Core Team 2021) was used to fit all of the models in a fully Bayesian setting. The formula syntax of the \(\textbf{brms}\) package is similar to the syntax used in the \(\textbf{lme4}\) package that is widely used to fit mixed effects models in frequentist settings In either package, the linear predictor for \(\mu\) described above could be expressed as:

\[\sim 1 + (1|WSA9) + (1|WSA9:state) + (1|WSA9:state:size)\]

In the \(\textbf{brms}\) package, there is additionaly functionality and syntax for multivariate responses and for allowing the varying intercepts in a multivariate model to be correlated, e.g.,:

\[ N_2O_{dissolved}\sim 1 + (1|a|WSA9) + (1|b|WSA9:state) + (1|c|WSA9:state:size) \\ N_2O_{equilibrium}\sim 1 + (1|a|WSA9) + (1|b|WSA9:state) + (1|c|WSA9:state:size)\]

The above syntax would indicate that the linear predictor for both responses in the multivariate model have the same group-level varying effects, and that each of those effects are allowed to be correlated between responses.

For the remainder of this document, only this simplified syntax is presented to describe the model parameterizations. For more information on \(\textbf{brms}\) functionality and syntax with multivariate response models, the package vignette may be helpful, and can be found at: https://cran.r-project.org/web/packages/brms/vignettes/brms_multivariate.html.

3.1 Model 1

The first model fit was the one described above.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

bf_n2o <- bf(log(n2o) ~ 1 + 
               (1 | a | WSA9) + 
               (1 | b | WSA9:state) + 
               (1 | c | WSA9:state:size_cat),
             family = gaussian())

bf_n2oeq <- bf(log(n2o_eq) ~ 1 + 
               (1 | a | WSA9) + 
               (1 | b | WSA9:state) +
               (1 | c | WSA9:state:size_cat),
             family = gaussian())

priors <- c(
  prior(normal(2, 1), class = "Intercept", resp = "logn2o"), # centered near data mean
  prior(exponential(2), class = "sd", resp = "logn2o"),
  prior(exponential(2), class = "sigma", resp = "logn2o"),
  prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"), # centered near data mean
  prior(exponential(2), class = "sd", resp = "logn2oeq"),
  prior(exponential(2), class = "sigma", resp = "logn2oeq"),
  prior(lkj(2), class = "rescor"),
  prior(lkj(2), class = "cor")
  )

n2o_mod1 <- brm(bf_n2o + bf_n2oeq + set_rescor(rescor = TRUE),
                data = df_model,
                prior = priors,
                control = list(adapt_delta = 0.99, max_treedepth = 14),
                #sample_prior = "only",
                save_pars = save_pars(all = TRUE),
                seed = 145,
                chains=4, 
                iter=5000, 
                cores=4)

save(n2o_mod1, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")

3.1.1 Summarize fit

The summaries of the estimated parameters and key HMC convergence diagnostics for the fitted model are printed below. There were no obvious issues with the HMC sampling. All \(\hat{R}\) values were less than 1.01 and effective sample size (\(ESS\)) calculations suggested that the posterior contained a sufficient number of effective samples for conducting inference.

 Family: MV(gaussian, gaussian) 
  Links: mu = identity; sigma = identity
         mu = identity; sigma = identity 
Formula: log(n2o) ~ 1 + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat) 
         log(n2o_eq) ~ 1 + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat) 
   Data: df_model (Number of observations: 984) 
  Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 10000

Priors: 
Intercept_logn2o ~ normal(2, 1)
Intercept_logn2oeq ~ normal(2, 1)
L ~ lkj_corr_cholesky(2)
Lrescor ~ lkj_corr_cholesky(2)
<lower=0> sd_logn2o ~ exponential(2)
<lower=0> sd_logn2oeq ~ exponential(2)
<lower=0> sigma_logn2o ~ exponential(2)
<lower=0> sigma_logn2oeq ~ exponential(2)

Group-Level Effects: 
~WSA9 (Number of levels: 9) 
                                         Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept)                         0.04      0.03     0.00     0.13 1.00
sd(logn2oeq_Intercept)                       0.06      0.02     0.03     0.11 1.00
cor(logn2o_Intercept,logn2oeq_Intercept)     0.05      0.44    -0.80     0.83 1.01
                                         Bulk_ESS Tail_ESS
sd(logn2o_Intercept)                         2475     4230
sd(logn2oeq_Intercept)                       3326     5332
cor(logn2o_Intercept,logn2oeq_Intercept)     1141     3129

~WSA9:state (Number of levels: 96) 
                                         Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept)                         0.26      0.03     0.21     0.32 1.00
sd(logn2oeq_Intercept)                       0.04      0.01     0.03     0.05 1.00
cor(logn2o_Intercept,logn2oeq_Intercept)     0.24      0.15    -0.07     0.52 1.00
                                         Bulk_ESS Tail_ESS
sd(logn2o_Intercept)                         3199     4775
sd(logn2oeq_Intercept)                       3549     5476
cor(logn2o_Intercept,logn2oeq_Intercept)     3332     5553

~WSA9:state:size_cat (Number of levels: 352) 
                                         Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept)                         0.09      0.04     0.01     0.16 1.00
sd(logn2oeq_Intercept)                       0.02      0.01     0.00     0.03 1.00
cor(logn2o_Intercept,logn2oeq_Intercept)    -0.22      0.37    -0.85     0.58 1.00
                                         Bulk_ESS Tail_ESS
sd(logn2o_Intercept)                          863     1397
sd(logn2oeq_Intercept)                        928     1365
cor(logn2o_Intercept,logn2oeq_Intercept)     1661     2807

Population-Level Effects: 
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
logn2o_Intercept       2.02      0.04     1.95     2.09 1.00     2641     3963
logn2oeq_Intercept     2.00      0.02     1.96     2.04 1.00     3446     4550

Family Specific Parameters: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_logn2o       0.40      0.01     0.37     0.42 1.00     2749     5523
sigma_logn2oeq     0.08      0.00     0.08     0.09 1.00     4790     6698

Residual Correlations: 
                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
rescor(logn2o,logn2oeq)     0.21      0.04     0.14     0.27 1.00     6172     7243

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

In the summary above, the estimated standard deviations for the varying group effects on the mean behavior of the dissolved N2O response suggested fairly low, but non-zero variability across each of the three levels. The standard deviations estimated for the same varying effects for equilibrium N2O were also relatively small. Finally, note the relatively small, but positive residual correlation between the two N2O responses.

Before investing too much into the interpretation of this model, however, the model fit was evaluated below using a series of graphical posterior predictive checks [PPC; Gelman et al. (2014); Gelman, Hill, and Vehtari (2020), Ch. 11].

3.1.2 Model checks

3.1.2.1 Dissolved N2O

Below are a series of panels illustrating graphical PPCs for the log(N2O) component of the model. The top left panel compares a density plot of the observed data (black line) to density lines drawn for 200 samples from the posterior predictive distribution (PPD; blue lines) of the fitted model. The top right panel similarly compares the cumulative density distributions. The left middle panel simulataneously compares means vs. standard deviations for 1000 draws from the PPD (blue dots) to the sample mean and standard deviation (black dot). The right middle panel compares skewness vs. kurtosis for 1000 draws from the PPD to the skewness and kurtosis values calculated for the observed data. The bottom left panel compares max vs. min values for 1000 draws from the PPD to the max and min values of the sample data. Finally, the bottom right panel shows the observed vs. average predicted values for each observation in the sample. The average predicted values were calculated as the mean prediction for each observation in the PPD based on 1000 draws.

The general takeaway from the PPCs above was that the model replicated the central tendency of the observed data fairly well, but failed to sufficiently replicate other important aspects of the distribution, such as skewness and kurtosis. The observed vs. average predictions scatterplot suggested substantial heteroscedasticity in the errors.

The same checks were run below, but for the test set of 95 held-out, second-visit data points.

The patterns in misfit indicated above for the re-visit data were similar to the patterns indicated in the PPCs with the training data.

3.1.2.2 Equilibrium N2O

Below are PPCs for the equilibrium N2O component of the model. As with the dissolved N2O response above, the model did an OK job at replicating the central tendency, but performed less well at replicating some important aspects of the overall distribution.

Below are the same PPCs for equilibrium N2O in the re-visit sites.

3.1.2.3 Bivariate

The graphical check below compares bivariate density contours estimated from the observed data (black lines) to density contours estimated for each of 20 draws from the PPD. The model appeared to do a good job of replicating the bivariate mean, but was poor at representing the overall joint distribution.

The same bivariate check is shown below for the re-visit data.

3.1.2.4 Saturation

The graphical PPCs below were aimed at evaluating how well the multivariate model did at representing the observed saturation ratio: \[N_2O_{dissolved}:N_2O_{equilibrium}\] This quantity was estimated as a derived variable by simply dividing the N2O PPD by the equilibrium N2O PPD. Likewise, the proportion of under-saturated lakes in the sample was estimated by summing the number of lakes from each posterior predictive draw wherein the ratio was < 1 and dividing that number by the total number of lakes in the sample, which was 984. Overall, these checks indicated that properly representing the tails of the N2O and N2O-eq observations would likely be necessary in order to better replicate the observed saturation metrics. For example, the model did a poor job replicating the observed proportion of under-saturated lakes, underestimating it by more than 10 percentage points, on average.

The top left panel, above, is a density plot of the observed saturation ratio (black line) compared to an estimate using 50 draws from the model (blue lines). The top right panel shows the observed proportion of under-saturated lakes compared to a model estimate based on 1000 draws from the PPD. The left middle panel shows the mean vs. standard deviation of the saturation ratio for the observed data compared to the same estimates for 500 posterior draws from the model’s PPD. The right middle panel shows the max vs. min for the sample compared to 500 draws from the model’s PPD. Finally, the bottom left panel shows the observed vs. average predicted saturation ratio for all 984 lakes sampled in the dataset.

The same PPCs are show below for the revisit data.

The checks above indicated that the model did a similarly underwhelming job of replicating some key properties of the saturation metrics calculated from the re-visit data.

3.1.2.5 R-square

Below, the Bayesian \(R^2\) values are reported for each reasponse in the model.

         Estimate Est.Error  Q2.5 Q97.5
R2logn2o    0.247     0.031 0.187 0.309
           Estimate Est.Error  Q2.5 Q97.5
R2logn2oeq    0.377     0.025 0.328 0.425

The \(R^2\) were also estimated for the re-visit data.

         Estimate Est.Error  Q2.5 Q97.5
R2logn2o    0.413      0.04 0.331 0.489
           Estimate Est.Error  Q2.5 Q97.5
R2logn2oeq    0.322     0.026 0.271 0.374

3.2 Model 2

In an attempt to better fit the observed data, the next model included distributional sub-models to allow for heterogeneous variances for each response conditional on the survey design structure.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

bf_n2o <- bf(log(n2o) ~ 1 +
               (1 | a | WSA9) + 
               (1 | b | WSA9:state) + 
               (1 | c | WSA9:state:size_cat),
             sigma ~ 1 +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat), 
             family = gaussian())

bf_n2oeq <- bf(log(n2o_eq) ~ 1 +
                 (1 | a | WSA9) + 
                 (1 | b | WSA9:state) +
                 (1 | c | WSA9:state:size_cat),
             sigma ~ 1 +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat),
             family = gaussian())

priors <- c(
  prior(normal(2, 1), class = "Intercept", resp = "logn2o"),
  prior(exponential(2), class = "sd", resp = "logn2o"),
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2o"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2o"),
  prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"), 
  prior(exponential(2), class = "sd", resp = "logn2oeq"),
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2oeq"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2oeq"),
  
  prior(lkj(2), class = "rescor"),
  prior(lkj(2), class = "cor")
  )

n2o_mod2 <- brm(bf_n2o + bf_n2oeq + set_rescor(rescor = TRUE),
                data = df_model, 
                prior = priors,
  control = list(adapt_delta = 0.975, max_treedepth = 12),
  #sample_prior = "only",
  save_pars = save_pars(all = TRUE),
  seed = 84512,
  chains=4, 
  iter=5000, 
  cores=4)

save(n2o_mod2, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod2.rda")

3.2.1 Summarize fit

The summaries of the estimated parameters and key HMC convergence diagnostics for the fitted model are printed below.

 Family: MV(gaussian, gaussian) 
  Links: mu = identity; sigma = log
         mu = identity; sigma = log 
Formula: log(n2o) ~ 1 + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat) 
         sigma ~ 1 + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
         log(n2o_eq) ~ 1 + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat) 
         sigma ~ 1 + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
   Data: df_model (Number of observations: 984) 
  Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 10000

Priors: 
Intercept_logn2o ~ normal(2, 1)
Intercept_logn2o_sigma ~ normal(-1, 2)
Intercept_logn2oeq ~ normal(2, 1)
Intercept_logn2oeq_sigma ~ normal(-1, 2)
L ~ lkj_corr_cholesky(2)
Lrescor ~ lkj_corr_cholesky(2)
<lower=0> sd_logn2o ~ exponential(2)
<lower=0> sd_logn2o_sigma ~ exponential(2)
<lower=0> sd_logn2oeq ~ exponential(2)
<lower=0> sd_logn2oeq_sigma ~ exponential(2)

Group-Level Effects: 
~WSA9 (Number of levels: 9) 
                                         Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept)                         0.06      0.03     0.02     0.12 1.00
sd(logn2oeq_Intercept)                       0.05      0.02     0.03     0.09 1.00
sd(sigma_logn2o_Intercept)                   0.22      0.12     0.02     0.50 1.00
sd(sigma_logn2oeq_Intercept)                 0.16      0.08     0.04     0.34 1.00
cor(logn2o_Intercept,logn2oeq_Intercept)     0.57      0.30    -0.18     0.95 1.00
                                         Bulk_ESS Tail_ESS
sd(logn2o_Intercept)                         2247     1772
sd(logn2oeq_Intercept)                       3423     4761
sd(sigma_logn2o_Intercept)                   1892     3094
sd(sigma_logn2oeq_Intercept)                 2413     1991
cor(logn2o_Intercept,logn2oeq_Intercept)     2696     3538

~WSA9:state (Number of levels: 96) 
                                         Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept)                         0.08      0.02     0.04     0.11 1.00
sd(logn2oeq_Intercept)                       0.04      0.01     0.03     0.05 1.00
sd(sigma_logn2o_Intercept)                   0.56      0.08     0.40     0.74 1.00
sd(sigma_logn2oeq_Intercept)                 0.22      0.05     0.11     0.32 1.00
cor(logn2o_Intercept,logn2oeq_Intercept)     0.42      0.19    -0.00     0.74 1.00
                                         Bulk_ESS Tail_ESS
sd(logn2o_Intercept)                         1276     1733
sd(logn2oeq_Intercept)                       2695     5090
sd(sigma_logn2o_Intercept)                   2007     3462
sd(sigma_logn2oeq_Intercept)                 1851     1881
cor(logn2o_Intercept,logn2oeq_Intercept)     1076     1960

~WSA9:state:size_cat (Number of levels: 352) 
                                         Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept)                         0.06      0.01     0.03     0.09 1.00
sd(logn2oeq_Intercept)                       0.02      0.01     0.00     0.03 1.00
sd(sigma_logn2o_Intercept)                   0.61      0.05     0.51     0.72 1.00
sd(sigma_logn2oeq_Intercept)                 0.19      0.06     0.07     0.29 1.00
cor(logn2o_Intercept,logn2oeq_Intercept)    -0.00      0.34    -0.70     0.62 1.00
                                         Bulk_ESS Tail_ESS
sd(logn2o_Intercept)                         1329     2182
sd(logn2oeq_Intercept)                       1052     1452
sd(sigma_logn2o_Intercept)                   2622     4939
sd(sigma_logn2oeq_Intercept)                 1603     1725
cor(logn2o_Intercept,logn2oeq_Intercept)      996     2063

Population-Level Effects: 
                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
logn2o_Intercept             1.93      0.03     1.88     1.99 1.00     3338     4462
sigma_logn2o_Intercept      -1.40      0.12    -1.63    -1.17 1.00     3511     5077
logn2oeq_Intercept           2.00      0.02     1.96     2.03 1.00     3384     4145
sigma_logn2oeq_Intercept    -2.60      0.07    -2.75    -2.45 1.00     4004     4848

Residual Correlations: 
                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
rescor(logn2o,logn2oeq)     0.36      0.03     0.29     0.43 1.00     6474     7733

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

From the summary above, note the moderate and positive residual correlation between the two N2O responses. The estimated standard deviations for the varying group effects on the mean behavior of the dissolved N2O response suggested fairly low, but non-zero variability across each of the three levels. The standard deviations estimated for the same varying effects for equilibrium N2O were also relatively small. However, before investing too much into the interpretation of these results, the model fit was further evaluated below using a series of graphical posterior predictive checks (PPCs).

3.2.2 Model checks

Below the same PPCs were performed as with the initial model (see above for more details on each panel). ##### Dissolved N2O Though the checks below suggest some improvement in replicating the tails of the observed data, this model did a poorer job at replicating central tendency.

3.2.2.1 Equilibrium N2O

The checks below suggest this model offered no improvement upon the initial model for equilibrium N2O. This model also appeared to do a poorer job of replicating the mean and overall standard deviation compared to the initial model.

3.2.2.2 Bivariate

This check perhaps suggested an improvement with regard to replicating the joint density. However, the predictions were still clearly over-dispersed relative to the observations.

3.2.2.3 Saturation

The PPCs for the saturation metrics below indicated that including the distributional models was perhaps an improvement on the initial model in some aspects; in particular, the bias in the predicted proportion of under-saturated lakes was substantially decreased. However, there appeared to still be issues in replicating the tails as well as issues with central tendency.

3.2.2.4 R-square

Relative to model 1, there was a substantial decrease in the \(R^2\) estimate for the dissolved N2O component of this model. The estimate for the equilibrium N2O-eq component was similar to the model 1.

         Estimate Est.Error  Q2.5 Q97.5
R2logn2o    0.056     0.009 0.038 0.075
           Estimate Est.Error  Q2.5 Q97.5
R2logn2oeq    0.379     0.022 0.335 0.421

3.3 Model 3

In the next model, we used covariates to try to improve the fit. The categorical version of the NO3 covariate was used as a monotonic ordinal predictor in the dissolved N2O component of the modl. For the equlibrium N2O component, we included surface temperature and log-transformed elevation, along with their interaction. The models also retained the distributional specifications included in model 2 above.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

bf_n2o <- bf(log(n2o) ~ mo(no3_cat) +
               surftemp +
               (mo(no3_cat) | a | WSA9) + 
               (mo(no3_cat) | b | WSA9:state) + 
               (1 | c | WSA9:state:size_cat),
             sigma ~ 1 +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat), 
             family = gaussian())

bf_n2oeq <- bf(log(n2o_eq) ~ surftemp +
                 log_elev +
                 surftemp:log_elev +
                 (1 | a | WSA9) + 
                 (1 | b | WSA9:state) +
                 (1 | c | WSA9:state:size_cat),
             sigma ~ 1 +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat),
             family = gaussian())

priors <- c(
  prior(normal(2, 1), class = "Intercept", resp = "logn2o"),
  prior(normal(0, 1), class = "b", resp = "logn2o"),
  prior(exponential(2), class = "sd", resp = "logn2o"),
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2o"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2o"),
  prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"), 
  prior(normal(0, 1), class = "b", resp = "logn2oeq"), 
  prior(exponential(2), class = "sd", resp = "logn2oeq"), 
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2oeq"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2oeq"),
  
  prior(lkj(2), class = "rescor"),
  prior(lkj(2), class = "cor")
  )

n2o_mod3 <- brm(bf_n2o + bf_n2oeq + set_rescor(rescor = TRUE),
                data = df_model, 
                prior = priors,
  control = list(adapt_delta = 0.975, max_treedepth = 12),
  #sample_prior = "only",
  save_pars = save_pars(all = TRUE),
  seed = 98456,
  chains=4, 
  iter=5000, 
  cores=4)

save(n2o_mod3, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")

3.3.1 Summarize fit

The fitted parameters and MCMC diagnostics are below.

 Family: MV(gaussian, gaussian) 
  Links: mu = identity; sigma = log
         mu = identity; sigma = log 
Formula: log(n2o) ~ mo(no3_cat) + surftemp + (mo(no3_cat) | a | WSA9) + (mo(no3_cat) | b | WSA9:state) + (1 | c | WSA9:state:size_cat) 
         sigma ~ 1 + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
         log(n2o_eq) ~ surftemp + log_elev + surftemp:log_elev + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat) 
         sigma ~ 1 + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
   Data: df_model (Number of observations: 984) 
  Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 10000

Priors: 
b_logn2o ~ normal(0, 1)
b_logn2oeq ~ normal(0, 1)
Intercept_logn2o ~ normal(2, 1)
Intercept_logn2o_sigma ~ normal(-1, 2)
Intercept_logn2oeq ~ normal(2, 1)
Intercept_logn2oeq_sigma ~ normal(-1, 2)
L ~ lkj_corr_cholesky(2)
Lrescor ~ lkj_corr_cholesky(2)
<lower=0> sd_logn2o ~ exponential(2)
<lower=0> sd_logn2o_sigma ~ exponential(2)
<lower=0> sd_logn2oeq ~ exponential(2)
<lower=0> sd_logn2oeq_sigma ~ exponential(2)
simo_logn2o_mono3_cat1 ~ dirichlet(1)

Group-Level Effects: 
~WSA9 (Number of levels: 9) 
                                         Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept)                         0.05      0.02     0.02     0.10 1.00
sd(logn2o_mono3_cat)                         0.14      0.05     0.06     0.26 1.00
sd(logn2oeq_Intercept)                       0.04      0.01     0.02     0.07 1.00
sd(sigma_logn2o_Intercept)                   0.12      0.08     0.01     0.31 1.00
sd(sigma_logn2oeq_Intercept)                 0.36      0.11     0.19     0.64 1.00
cor(logn2o_Intercept,logn2o_mono3_cat)      -0.14      0.33    -0.72     0.53 1.00
cor(logn2o_Intercept,logn2oeq_Intercept)     0.36      0.29    -0.28     0.83 1.00
cor(logn2o_mono3_cat,logn2oeq_Intercept)     0.37      0.29    -0.26     0.82 1.00
                                         Bulk_ESS Tail_ESS
sd(logn2o_Intercept)                         4265     3917
sd(logn2o_mono3_cat)                         4747     5482
sd(logn2oeq_Intercept)                       4798     5704
sd(sigma_logn2o_Intercept)                   3419     4599
sd(sigma_logn2oeq_Intercept)                 5027     6638
cor(logn2o_Intercept,logn2o_mono3_cat)       4970     6018
cor(logn2o_Intercept,logn2oeq_Intercept)     5073     6636
cor(logn2o_mono3_cat,logn2oeq_Intercept)     7256     7538

~WSA9:state (Number of levels: 96) 
                                         Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept)                         0.03      0.02     0.00     0.07 1.00
sd(logn2o_mono3_cat)                         0.14      0.02     0.10     0.18 1.00
sd(logn2oeq_Intercept)                       0.03      0.00     0.03     0.04 1.00
sd(sigma_logn2o_Intercept)                   0.30      0.09     0.09     0.47 1.00
sd(sigma_logn2oeq_Intercept)                 0.28      0.06     0.17     0.40 1.00
cor(logn2o_Intercept,logn2o_mono3_cat)      -0.30      0.32    -0.82     0.45 1.00
cor(logn2o_Intercept,logn2oeq_Intercept)     0.00      0.27    -0.53     0.55 1.01
cor(logn2o_mono3_cat,logn2oeq_Intercept)     0.45      0.13     0.16     0.69 1.00
                                         Bulk_ESS Tail_ESS
sd(logn2o_Intercept)                          894     2100
sd(logn2o_mono3_cat)                         3795     5396
sd(logn2oeq_Intercept)                       4164     5957
sd(sigma_logn2o_Intercept)                   1157     1279
sd(sigma_logn2oeq_Intercept)                 2747     4313
cor(logn2o_Intercept,logn2o_mono3_cat)        550      693
cor(logn2o_Intercept,logn2oeq_Intercept)      596      971
cor(logn2o_mono3_cat,logn2oeq_Intercept)     2536     4431

~WSA9:state:size_cat (Number of levels: 352) 
                                         Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept)                         0.06      0.01     0.04     0.08 1.00
sd(logn2oeq_Intercept)                       0.00      0.00     0.00     0.01 1.00
sd(sigma_logn2o_Intercept)                   0.58      0.06     0.47     0.70 1.00
sd(sigma_logn2oeq_Intercept)                 0.28      0.05     0.18     0.39 1.00
cor(logn2o_Intercept,logn2oeq_Intercept)     0.26      0.39    -0.60     0.87 1.00
                                         Bulk_ESS Tail_ESS
sd(logn2o_Intercept)                         1429     2386
sd(logn2oeq_Intercept)                       1779     3895
sd(sigma_logn2o_Intercept)                   2003     4421
sd(sigma_logn2oeq_Intercept)                 1812     3745
cor(logn2o_Intercept,logn2oeq_Intercept)     4523     5554

Population-Level Effects: 
                           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
logn2o_Intercept               2.40      0.05     2.29     2.50 1.00     5046     6296
sigma_logn2o_Intercept        -1.70      0.08    -1.87    -1.55 1.00     5162     5452
logn2oeq_Intercept             3.10      0.05     3.01     3.19 1.00     8903     7808
sigma_logn2oeq_Intercept      -3.54      0.14    -3.81    -3.27 1.00     4065     5338
logn2o_surftemp               -0.02      0.00    -0.03    -0.02 1.00     5529     6684
logn2oeq_surftemp             -0.04      0.00    -0.04    -0.04 1.00     9722     7658
logn2oeq_log_elev             -0.07      0.01    -0.09    -0.06 1.00     9200     7505
logn2oeq_surftemp:log_elev     0.00      0.00     0.00     0.00 1.00     9527     8028
logn2o_mono3_cat               0.23      0.05     0.12     0.34 1.00     3948     4777

Simplex Parameters: 
                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
logn2o_mono3_cat1[1]     0.02      0.01     0.00     0.04 1.00     5535     5409
logn2o_mono3_cat1[2]     0.09      0.02     0.04     0.13 1.00     4281     5251
logn2o_mono3_cat1[3]     0.21      0.05     0.13     0.32 1.00     3683     4649
logn2o_mono3_cat1[4]     0.69      0.05     0.58     0.77 1.00     3498     4251

Residual Correlations: 
                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
rescor(logn2o,logn2oeq)     0.15      0.04     0.07     0.23 1.00    11381     7610

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

3.3.2 Model checks

3.3.2.1 Dissolved N2O

The PPCs below indicated a better fit compared to the previous models. The central tendency and tail behavior looked to be reasonably replicated by comparison. However, the observed vs. predicted plot suggested that larger overserved values were being systematically underestimated.

3.3.2.2 Equilibrium N2O

The PPCs below indicated that this model appeared to be an improvement for equilibrium N2O as well. However, some checks (e.g., skewness) suggested some room for additional improvement.

3.3.2.3 Bivariate

The check for the joint distribution below also suggested an improvement up the previous models.

3.3.2.4 Saturation

This model looked to be an improvement with regard to the PPCs for the saturation metrics. However, the proportion of under-saturated lakes remained biased low and other checks indicated that further improvements would be ideal.

3.3.2.5 R-square

The \(R^2\) estimates for this model are below and suggested substantial improvements on the previous models.

         Estimate Est.Error  Q2.5 Q97.5
R2logn2o    0.626     0.017 0.591  0.66
           Estimate Est.Error Q2.5 Q97.5
R2logn2oeq    0.879     0.004 0.87 0.886

3.3.3 Covariate effects

Below are plots illustrating the modeled effects of covariates on both N2O and equilibrium N2O. #### N2O The conditional effects plots below for N2O illustrate a positive, monotonic, and non-linear relationship between NO3 and N2O; and a negative, linear relationship between surface temperature and N2O.

3.3.3.1 Equilibrium N2O

The modeled effects below for the equilibrium N2O component of the model illustrated a negative relationship between equilibrium N2O and both predictors and an interaction such that the surface temperature effect became slightly steeper at lower elevations.

3.4 Model 4

In the next model, covariate terms were also included in the \(\sigma\) components of both models in order to try to better capture remaining heterogeneity in the variances of both N2O and N2O-eq.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

bf_n2o <- bf(log(n2o) ~ mo(no3_cat) +
               surftemp +
               (mo(no3_cat) | a | WSA9) + 
               (mo(no3_cat) | b | WSA9:state) + 
               (1 | c | WSA9:state:size_cat),
             sigma ~ mo(no3_cat) +
               surftemp +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat), 
             family = gaussian())

bf_n2oeq <- bf(log(n2o_eq) ~ surftemp +
                 log_elev +
                 surftemp:log_elev +
                 (1 | a | WSA9) + 
                 (1 | b | WSA9:state) +
                 (1 | c | WSA9:state:size_cat),
             sigma ~ surftemp +
               log_elev +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat),
             family = gaussian())

priors <- c(
  prior(normal(2, 1), class = "Intercept", resp = "logn2o"),
  prior(normal(0, 1), class = "b", resp = "logn2o"),
  prior(exponential(2), class = "sd", resp = "logn2o"),
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2o"),
  prior(normal(0, 1), class = "b", dpar = "sigma", resp = "logn2o"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2o"),
  prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"), 
  prior(normal(0, 1), class = "b", resp = "logn2oeq"), 
  prior(exponential(2), class = "sd", resp = "logn2oeq"), 
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2oeq"),
  prior(normal(0, 1), class = "b", dpar = "sigma", resp = "logn2oeq"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2oeq"),
  
  prior(lkj(2), class = "rescor"),
  prior(lkj(2), class = "cor")
  )

n2o_mod4 <- brm(bf_n2o + bf_n2oeq + set_rescor(rescor = TRUE),
                data = df_model, 
                prior = priors,
  control = list(adapt_delta = 0.975, max_treedepth = 12),
  #sample_prior = "only",
  save_pars = save_pars(all = TRUE),
  seed = 15851,
  chains=4, 
  iter=5000, 
  cores=4)

save(n2o_mod4, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")

3.4.1 Summarize fit

Below is a summary of the fitted parameters along with some convergence diagnostics.

 Family: MV(gaussian, gaussian) 
  Links: mu = identity; sigma = log
         mu = identity; sigma = log 
Formula: log(n2o) ~ mo(no3_cat) + surftemp + (mo(no3_cat) | a | WSA9) + (mo(no3_cat) | b | WSA9:state) + (1 | c | WSA9:state:size_cat) 
         sigma ~ mo(no3_cat) + surftemp + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
         log(n2o_eq) ~ surftemp + log_elev + surftemp:log_elev + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat) 
         sigma ~ surftemp + log_elev + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
   Data: df_model (Number of observations: 984) 
  Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 10000

Priors: 
b_logn2o ~ normal(0, 1)
b_logn2o_sigma ~ normal(0, 1)
b_logn2oeq ~ normal(0, 1)
b_logn2oeq_sigma ~ normal(0, 1)
Intercept_logn2o ~ normal(2, 1)
Intercept_logn2o_sigma ~ normal(-1, 2)
Intercept_logn2oeq ~ normal(2, 1)
Intercept_logn2oeq_sigma ~ normal(-1, 2)
L ~ lkj_corr_cholesky(2)
Lrescor ~ lkj_corr_cholesky(2)
<lower=0> sd_logn2o ~ exponential(2)
<lower=0> sd_logn2o_sigma ~ exponential(2)
<lower=0> sd_logn2oeq ~ exponential(2)
<lower=0> sd_logn2oeq_sigma ~ exponential(2)
simo_logn2o_mono3_cat1 ~ dirichlet(1)
simo_logn2o_sigma_mono3_cat1 ~ dirichlet(1)

Group-Level Effects: 
~WSA9 (Number of levels: 9) 
                                         Estimate Est.Error l-95% CI u-95% CI  Rhat
sd(logn2o_Intercept)                        0.050     0.020    0.020    0.100 1.000
sd(logn2o_mono3_cat)                        0.145     0.054    0.065    0.272 1.000
sd(logn2oeq_Intercept)                      0.036     0.012    0.019    0.065 1.001
sd(sigma_logn2o_Intercept)                  0.113     0.080    0.005    0.303 1.001
sd(sigma_logn2oeq_Intercept)                0.208     0.098    0.039    0.435 1.001
cor(logn2o_Intercept,logn2o_mono3_cat)     -0.186     0.327   -0.755    0.489 1.000
cor(logn2o_Intercept,logn2oeq_Intercept)    0.344     0.290   -0.277    0.817 1.000
cor(logn2o_mono3_cat,logn2oeq_Intercept)    0.364     0.286   -0.269    0.819 1.001
                                         Bulk_ESS Tail_ESS
sd(logn2o_Intercept)                         4733     4348
sd(logn2o_mono3_cat)                         4386     6208
sd(logn2oeq_Intercept)                       3977     5215
sd(sigma_logn2o_Intercept)                   2543     4693
sd(sigma_logn2oeq_Intercept)                 2777     2177
cor(logn2o_Intercept,logn2o_mono3_cat)       4616     6092
cor(logn2o_Intercept,logn2oeq_Intercept)     5667     6074
cor(logn2o_mono3_cat,logn2oeq_Intercept)     7634     7624

~WSA9:state (Number of levels: 96) 
                                         Estimate Est.Error l-95% CI u-95% CI  Rhat
sd(logn2o_Intercept)                        0.035     0.017    0.003    0.068 1.001
sd(logn2o_mono3_cat)                        0.117     0.022    0.076    0.162 1.001
sd(logn2oeq_Intercept)                      0.033     0.003    0.027    0.040 1.000
sd(sigma_logn2o_Intercept)                  0.181     0.099    0.011    0.374 1.004
sd(sigma_logn2oeq_Intercept)                0.287     0.057    0.177    0.403 1.001
cor(logn2o_Intercept,logn2o_mono3_cat)     -0.317     0.311   -0.810    0.405 1.003
cor(logn2o_Intercept,logn2oeq_Intercept)    0.025     0.265   -0.497    0.557 1.005
cor(logn2o_mono3_cat,logn2oeq_Intercept)    0.448     0.165    0.098    0.738 1.001
                                         Bulk_ESS Tail_ESS
sd(logn2o_Intercept)                          981     2017
sd(logn2o_mono3_cat)                         3149     4612
sd(logn2oeq_Intercept)                       3457     6360
sd(sigma_logn2o_Intercept)                    858     2373
sd(sigma_logn2oeq_Intercept)                 2470     3862
cor(logn2o_Intercept,logn2o_mono3_cat)        945     1286
cor(logn2o_Intercept,logn2oeq_Intercept)      556      823
cor(logn2o_mono3_cat,logn2oeq_Intercept)     1518     3021

~WSA9:state:size_cat (Number of levels: 352) 
                                         Estimate Est.Error l-95% CI u-95% CI  Rhat
sd(logn2o_Intercept)                        0.065     0.011    0.042    0.087 1.003
sd(logn2oeq_Intercept)                      0.004     0.002    0.000    0.008 1.001
sd(sigma_logn2o_Intercept)                  0.539     0.055    0.432    0.647 1.002
sd(sigma_logn2oeq_Intercept)                0.263     0.051    0.162    0.363 1.001
cor(logn2o_Intercept,logn2oeq_Intercept)    0.395     0.348   -0.479    0.904 1.001
                                         Bulk_ESS Tail_ESS
sd(logn2o_Intercept)                         1478     2832
sd(logn2oeq_Intercept)                       1506     2453
sd(sigma_logn2o_Intercept)                   1443     3611
sd(sigma_logn2oeq_Intercept)                 2372     4366
cor(logn2o_Intercept,logn2oeq_Intercept)     4231     4956

Population-Level Effects: 
                           Estimate Est.Error l-95% CI u-95% CI  Rhat Bulk_ESS Tail_ESS
logn2o_Intercept              2.386     0.055    2.278    2.490 1.000     5971     7057
sigma_logn2o_Intercept       -1.855     0.281   -2.389   -1.294 1.000     5965     7704
logn2oeq_Intercept            3.115     0.051    3.016    3.217 1.001     7109     7366
sigma_logn2oeq_Intercept     -1.922     0.373   -2.634   -1.180 1.000     8445     7581
logn2o_surftemp              -0.021     0.002   -0.026   -0.017 1.000     5850     7677
sigma_logn2o_surftemp        -0.001     0.011   -0.023    0.021 1.001     6175     7768
logn2oeq_surftemp            -0.042     0.002   -0.046   -0.039 1.001     8285     7516
logn2oeq_log_elev            -0.080     0.008   -0.096   -0.065 1.001     7656     7564
logn2oeq_surftemp:log_elev    0.002     0.000    0.002    0.003 1.000     8407     7502
sigma_logn2oeq_surftemp      -0.065     0.010   -0.085   -0.046 1.000    10043     8163
sigma_logn2oeq_log_elev      -0.019     0.038   -0.095    0.054 1.000     6646     7910
logn2o_mono3_cat              0.225     0.058    0.108    0.340 1.000     4488     5143
sigma_logn2o_mono3_cat        0.256     0.037    0.187    0.331 1.000     6036     7643

Simplex Parameters: 
                           Estimate Est.Error l-95% CI u-95% CI  Rhat Bulk_ESS Tail_ESS
logn2o_mono3_cat1[1]          0.018     0.012    0.001    0.045 1.000     5709     4429
logn2o_mono3_cat1[2]          0.093     0.027    0.041    0.149 1.000     5034     5645
logn2o_mono3_cat1[3]          0.231     0.060    0.126    0.366 1.000     4974     5478
logn2o_mono3_cat1[4]          0.659     0.062    0.520    0.763 1.000     4446     5210
sigma_logn2o_mono3_cat1[1]    0.107     0.066    0.007    0.255 1.000     6168     4471
sigma_logn2o_mono3_cat1[2]    0.129     0.087    0.006    0.328 1.000     8105     5010
sigma_logn2o_mono3_cat1[3]    0.459     0.151    0.168    0.758 1.000     7272     5802
sigma_logn2o_mono3_cat1[4]    0.304     0.139    0.038    0.569 1.000     6289     4264

Residual Correlations: 
                        Estimate Est.Error l-95% CI u-95% CI  Rhat Bulk_ESS Tail_ESS
rescor(logn2o,logn2oeq)    0.146     0.040    0.067    0.223 1.001     9839     8508

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

3.4.2 Model checks

Again, the same PPCs were employed for this model as above. #### Dissolved N2O Again, this model appeared to be an improvement on the previous model, particularly with regard to the more constant variance indicated in the observed vs. predicted plot (bottom, right panel).

Using all posterior draws for ppc type 'scatter_avg' by default.

3.4.2.1 Equilibrium N2O

This component of the model also seemed to be an improvement over model 3, with better representation in the tails as indicated in the skewness vs. kurtosis PPC.

3.4.2.2 Bivariate

Again, an improvement over the previous model with a tighter fit of the PPC to the observed bivariate density.

3.4.2.3 Saturation

This check also suggested an improvement over the previous models, with better tail behavior and less bias in the proportion under-saturated measure.

3.4.2.4 R-square

The Bayesian \(R^2\) estimates below indicated an improvement from the previous models.

         Estimate Est.Error  Q2.5 Q97.5
R2logn2o    0.606     0.025 0.551 0.651
           Estimate Est.Error  Q2.5 Q97.5
R2logn2oeq    0.875     0.005 0.865 0.883

3.4.3 Covariate effects

3.4.3.1 N2O

The conditional effects plots for the covariate effects on N2O remained largely unchanged from the previous model.

Below are estimates of the conditional effects of the covariates on \(\sigma\) for N2O. These plots suggested a large effect of NO3 on the variance of N2O, but little to no effect of surface temperature.

3.4.3.2 Equilibrium N2O

The covariate effects on N2O remained largely the same as for the previous model.

The covariate effects on \(\sigma\) for N2O-eq suggested an negative effect of surface temperature and litte to no effect of elevation.

3.5 Model 5

In the next model, more complexity is added to the N2O component by including a covariate for lake surface area (log scale) as well as interactions between NO3 and log(area) and surface temperature.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

bf_n2o <- bf(log(n2o) ~ mo(no3_cat) +
               log_area +
               surftemp + 
               mo(no3_cat):log_area +
               mo(no3_cat):surftemp +
               (mo(no3_cat) | a | WSA9) + 
               (mo(no3_cat) | b | WSA9:state) + 
               (1 | c | WSA9:state:size_cat),
             sigma ~ log_area +
               mo(no3_cat) +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat), 
             family = gaussian())

bf_n2oeq <- bf(log(n2o_eq) ~ surftemp +
                 log_elev +
                 surftemp:log_elev +
                 (1 | a | WSA9) + 
                 (1 | b | WSA9:state) +
                 (1 | c | WSA9:state:size_cat),
             sigma ~ surftemp +
               log_elev +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat),
             family = gaussian())

priors <- c(
  prior(normal(2, 1), class = "Intercept", resp = "logn2o"),
  prior(normal(0, 1), class = "b", resp = "logn2o"),
  prior(exponential(2), class = "sd", resp = "logn2o"),
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2o"),
  prior(normal(0, 1), class = "b", dpar = "sigma", resp = "logn2o"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2o"),
  
  prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"), 
  prior(normal(0, 1), class = "b", resp = "logn2oeq"), 
  prior(exponential(2), class = "sd", resp = "logn2oeq"), 
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2oeq"),
  prior(normal(0, 1), class = "b", dpar = "sigma", resp = "logn2oeq"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2oeq"),
  
  prior(lkj(2), class = "rescor"),
  prior(lkj(2), class = "cor")
  )

n2o_mod5 <- brm(bf_n2o + 
                  bf_n2oeq +
                  set_rescor(rescor = TRUE),
                data = df_model, 
                prior = priors,
  control = list(adapt_delta = 0.975, max_treedepth = 12),
  #sample_prior = "only",
  save_pars = save_pars(all = TRUE),
  seed = 54741,
  chains=4, 
  iter=5000, 
  cores=4)

save(n2o_mod5, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")

3.5.1 Summarize fit

Below is a summary of the fitted parameters along with MCMC convergence diagnostics.

 Family: MV(gaussian, gaussian) 
  Links: mu = identity; sigma = log
         mu = identity; sigma = log 
Formula: log(n2o) ~ mo(no3_cat) + log_area + surftemp + mo(no3_cat):log_area + mo(no3_cat):surftemp + (mo(no3_cat) | a | WSA9) + (mo(no3_cat) | b | WSA9:state) + (1 | c | WSA9:state:size_cat) 
         sigma ~ log_area + mo(no3_cat) + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
         log(n2o_eq) ~ surftemp + log_elev + surftemp:log_elev + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat) 
         sigma ~ surftemp + log_elev + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
   Data: df_model (Number of observations: 984) 
  Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 10000

Priors: 
b_logn2o ~ normal(0, 1)
b_logn2o_sigma ~ normal(0, 1)
b_logn2oeq ~ normal(0, 1)
b_logn2oeq_sigma ~ normal(0, 1)
Intercept_logn2o ~ normal(2, 1)
Intercept_logn2o_sigma ~ normal(-1, 2)
Intercept_logn2oeq ~ normal(2, 1)
Intercept_logn2oeq_sigma ~ normal(-1, 2)
L ~ lkj_corr_cholesky(2)
Lrescor ~ lkj_corr_cholesky(2)
<lower=0> sd_logn2o ~ exponential(2)
<lower=0> sd_logn2o_sigma ~ exponential(2)
<lower=0> sd_logn2oeq ~ exponential(2)
<lower=0> sd_logn2oeq_sigma ~ exponential(2)
simo_logn2o_mono3_cat:log_area1 ~ dirichlet(1)
simo_logn2o_mono3_cat:surftemp1 ~ dirichlet(1)
simo_logn2o_mono3_cat1 ~ dirichlet(1)
simo_logn2o_sigma_mono3_cat1 ~ dirichlet(1)

Group-Level Effects: 
~WSA9 (Number of levels: 9) 
                                         Estimate Est.Error l-95% CI u-95% CI  Rhat
sd(logn2o_Intercept)                        0.048     0.019    0.020    0.094 1.000
sd(logn2o_mono3_cat)                        0.081     0.050    0.007    0.199 1.003
sd(logn2oeq_Intercept)                      0.034     0.011    0.018    0.062 1.000
sd(sigma_logn2o_Intercept)                  0.111     0.076    0.006    0.293 1.000
sd(sigma_logn2oeq_Intercept)                0.209     0.102    0.036    0.445 1.001
cor(logn2o_Intercept,logn2o_mono3_cat)     -0.056     0.360   -0.713    0.644 1.000
cor(logn2o_Intercept,logn2oeq_Intercept)    0.464     0.284   -0.173    0.885 1.000
cor(logn2o_mono3_cat,logn2oeq_Intercept)    0.259     0.341   -0.467    0.824 1.000
                                         Bulk_ESS Tail_ESS
sd(logn2o_Intercept)                         4413     4624
sd(logn2o_mono3_cat)                         1412     2265
sd(logn2oeq_Intercept)                       4061     6633
sd(sigma_logn2o_Intercept)                   2790     4064
sd(sigma_logn2oeq_Intercept)                 2176     1898
cor(logn2o_Intercept,logn2o_mono3_cat)       5823     6525
cor(logn2o_Intercept,logn2oeq_Intercept)     4935     6214
cor(logn2o_mono3_cat,logn2oeq_Intercept)     4684     4770

~WSA9:state (Number of levels: 96) 
                                         Estimate Est.Error l-95% CI u-95% CI  Rhat
sd(logn2o_Intercept)                        0.046     0.014    0.016    0.072 1.001
sd(logn2o_mono3_cat)                        0.097     0.024    0.053    0.146 1.001
sd(logn2oeq_Intercept)                      0.033     0.003    0.027    0.040 1.001
sd(sigma_logn2o_Intercept)                  0.207     0.088    0.024    0.369 1.007
sd(sigma_logn2oeq_Intercept)                0.285     0.056    0.177    0.395 1.002
cor(logn2o_Intercept,logn2o_mono3_cat)     -0.265     0.284   -0.762    0.336 1.001
cor(logn2o_Intercept,logn2oeq_Intercept)    0.163     0.200   -0.232    0.559 1.005
cor(logn2o_mono3_cat,logn2oeq_Intercept)    0.266     0.209   -0.155    0.649 1.002
                                         Bulk_ESS Tail_ESS
sd(logn2o_Intercept)                         1323     1765
sd(logn2o_mono3_cat)                         2612     2569
sd(logn2oeq_Intercept)                       3321     5286
sd(sigma_logn2o_Intercept)                    776     1082
sd(sigma_logn2oeq_Intercept)                 2464     3702
cor(logn2o_Intercept,logn2o_mono3_cat)       1708     3060
cor(logn2o_Intercept,logn2oeq_Intercept)      960     1715
cor(logn2o_mono3_cat,logn2oeq_Intercept)      883     1843

~WSA9:state:size_cat (Number of levels: 352) 
                                         Estimate Est.Error l-95% CI u-95% CI  Rhat
sd(logn2o_Intercept)                        0.035     0.015    0.004    0.062 1.004
sd(logn2oeq_Intercept)                      0.003     0.002    0.000    0.007 1.003
sd(sigma_logn2o_Intercept)                  0.479     0.056    0.374    0.591 1.006
sd(sigma_logn2oeq_Intercept)                0.260     0.052    0.160    0.360 1.001
cor(logn2o_Intercept,logn2oeq_Intercept)    0.235     0.413   -0.658    0.872 1.002
                                         Bulk_ESS Tail_ESS
sd(logn2o_Intercept)                          819     1375
sd(logn2oeq_Intercept)                       1569     3553
sd(sigma_logn2o_Intercept)                   1318     3346
sd(sigma_logn2oeq_Intercept)                 2326     3796
cor(logn2o_Intercept,logn2oeq_Intercept)     2950     5425

Population-Level Effects: 
                           Estimate Est.Error l-95% CI u-95% CI  Rhat Bulk_ESS Tail_ESS
logn2o_Intercept              2.380     0.055    2.273    2.487 1.001     4020     6632
sigma_logn2o_Intercept       -1.596     0.097   -1.789   -1.408 1.002     4961     6145
logn2oeq_Intercept            3.116     0.051    3.019    3.218 1.000     6860     7546
sigma_logn2oeq_Intercept     -1.922     0.374   -2.639   -1.179 1.000     7865     7774
logn2o_log_area               0.029     0.003    0.023    0.034 1.000     9054     8090
logn2o_surftemp              -0.025     0.002   -0.029   -0.021 1.000     3996     7106
sigma_logn2o_log_area        -0.095     0.020   -0.135   -0.055 1.001     6068     7491
logn2oeq_surftemp            -0.042     0.002   -0.046   -0.039 1.000     7418     8290
logn2oeq_log_elev            -0.080     0.008   -0.097   -0.065 1.000     6834     7865
logn2oeq_surftemp:log_elev    0.003     0.000    0.002    0.003 1.000     7339     7992
sigma_logn2oeq_surftemp      -0.065     0.010   -0.085   -0.046 1.000     9671     8132
sigma_logn2oeq_log_elev      -0.018     0.038   -0.096    0.054 1.000     6265     7038
logn2o_mono3_cat              0.026     0.127   -0.226    0.275 1.004     1971     3252
logn2o_mono3_cat:log_area    -0.036     0.010   -0.054   -0.016 1.001     2602     4549
logn2o_mono3_cat:surftemp     0.014     0.006    0.003    0.026 1.004     1478     2416
sigma_logn2o_mono3_cat        0.246     0.036    0.179    0.321 1.001     4246     6738

Simplex Parameters: 
                              Estimate Est.Error l-95% CI u-95% CI  Rhat Bulk_ESS
logn2o_mono3_cat1[1]             0.026     0.024    0.001    0.089 1.001     4261
logn2o_mono3_cat1[2]             0.091     0.066    0.005    0.252 1.001     1875
logn2o_mono3_cat1[3]             0.250     0.146    0.034    0.615 1.002     2027
logn2o_mono3_cat1[4]             0.633     0.166    0.206    0.867 1.003     1411
logn2o_mono3_cat:log_area1[1]    0.060     0.041    0.005    0.161 1.000     4395
logn2o_mono3_cat:log_area1[2]    0.042     0.039    0.001    0.142 1.000     7129
logn2o_mono3_cat:log_area1[3]    0.297     0.161    0.047    0.681 1.000     5220
logn2o_mono3_cat:log_area1[4]    0.602     0.176    0.166    0.865 1.000     4431
logn2o_mono3_cat:surftemp1[1]    0.043     0.040    0.006    0.144 1.002     3541
logn2o_mono3_cat:surftemp1[2]    0.080     0.056    0.015    0.222 1.001     3667
logn2o_mono3_cat:surftemp1[3]    0.276     0.111    0.105    0.573 1.001     3634
logn2o_mono3_cat:surftemp1[4]    0.601     0.142    0.168    0.785 1.002     2731
sigma_logn2o_mono3_cat1[1]       0.131     0.074    0.010    0.288 1.000     6840
sigma_logn2o_mono3_cat1[2]       0.151     0.096    0.010    0.367 1.000     7630
sigma_logn2o_mono3_cat1[3]       0.444     0.149    0.149    0.734 1.001     6331
sigma_logn2o_mono3_cat1[4]       0.275     0.138    0.026    0.544 1.001     4902
                              Tail_ESS
logn2o_mono3_cat1[1]              5305
logn2o_mono3_cat1[2]              3916
logn2o_mono3_cat1[3]              2856
logn2o_mono3_cat1[4]              2439
logn2o_mono3_cat:log_area1[1]     3509
logn2o_mono3_cat:log_area1[2]     6046
logn2o_mono3_cat:log_area1[3]     5683
logn2o_mono3_cat:log_area1[4]     4505
logn2o_mono3_cat:surftemp1[1]     3002
logn2o_mono3_cat:surftemp1[2]     2580
logn2o_mono3_cat:surftemp1[3]     3352
logn2o_mono3_cat:surftemp1[4]     2289
sigma_logn2o_mono3_cat1[1]        4269
sigma_logn2o_mono3_cat1[2]        5795
sigma_logn2o_mono3_cat1[3]        6964
sigma_logn2o_mono3_cat1[4]        5797

Residual Correlations: 
                        Estimate Est.Error l-95% CI u-95% CI  Rhat Bulk_ESS Tail_ESS
rescor(logn2o,logn2oeq)    0.141     0.038    0.066    0.216 1.000    11829     8204

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

3.5.2 Model checks

Again, the same PPCs as above were performed for this model. #### N2O PPC This PPC for N2O looked similar to the previous model.

3.5.2.1 Equilibrium N2O

Again, the PPCs for this model were similar to the previous model, which was unsurprising given that it was the same model for N2O-eq.

3.5.2.2 Bivariate

This PPC was also similar to the previous model.

3.5.2.3 Saturation

This check was also similar to the prevoius model, with perhaps slightly less bias in the proportion unsaturated estimates. There is also a potentially concerning extreme prediction in the observed vs predicted PPC.

3.5.2.4 R-square

         Estimate Est.Error  Q2.5 Q97.5
R2logn2o    0.629      0.03 0.563  0.68
           Estimate Est.Error  Q2.5 Q97.5
R2logn2oeq    0.874     0.005 0.864 0.882

3.5.3 Covariate effects

3.5.3.1 N2O

The conditional effects plot for the covariate effects N2O suggested a similar effect of NO3, but interesting interactions between NO3 and lake area and NO3 and surface temperature. For lake area, the effect was estimated to be larger and more negative at the highest levels of NO3; and slightly negative at the lowest level of NO3. For surface temperature, the effect was estimated to be largest and positive at the highest level of NO3; and negative at the lowest level of NO3.

The estimated covariate effects on \(\sigma\) suggested a negative relationship with log(area) and a positive relationship, again, with NO3.

3.5.3.2 Equilibrium N2O

The estimated covariate effect on N2O remained largely the same as estimated in the previous model.

3.6 A Final Model

As demonstrated above, models excluding the NO3 covariate consistently resulted in poorer fits to to the observed dissolved N2O data. Including surface temperature and elevation in the equilibrium N2O part of the model resulted in substantially improved replication of key aspects of the observed data. Likewise, added flexibility in the distributional terms for both dissolved and equilibrium N2O led to improvements.

To make inferences from this model for N2O in the population of interest, however, the included covariates needed to be (1) fully observed across that population or (2) their missingness needed to be modeled. For the lake area and elevation covariates, data was available for all lakes from previously compiled geospatial databases. However, neither surface temperature or NO3 were observed for lakes outside of the sample. They were partially observed with respect to the target population. Their missingness needed to be accounted for in a model. Therefore, a more complex model was constructed below that included surface temperature and NO3 as additional responses conditioned on the survey design variables and fully observed covariates. This approach to inference for N2O was similar to a Bayesian structural equation model (Merkle et al. 2021; Merkle and Rosseel 2018). The main details of the logical dependence structure could be characterized as:

\[\begin{align} \color{#1F449C}{\boldsymbol{N_2O_{diss}}} &=\sim Survey + Area + \color{#F05039}{\boldsymbol{NO_3}} + \color{#EEBAB4}{\boldsymbol{Temp}} \\ \color{#A8B6CC}{\boldsymbol{N_2O_{equil}}} &=\sim Survey + Elev + \color{#EEBAB4}{\boldsymbol{Temp}}\\ \color{#F05039}{\boldsymbol{NO_3}} &=\sim Survey + Area + \color{#EEBAB4}{\boldsymbol{Temp}} \\ \color{#EEBAB4}{\boldsymbol{Temp}} &=\sim Survey + Lat + Elev + Day \end{align}\]

Variables in color text above were treated as partially observed with respect to the population of interest (i.e., observed only in the sample), whereas variables in black text were considered fully observed. The partially observed variables, being dissolved and equilibrium N2O, NO3, and surface temperature, were each modeled conditional on the survey design variables and other partially and/or fully observed covariates. This structural equation approach requires a more complex set of post-processing steps compared to a typical MRP analysis. In order to propagate estimates and uncertainty through the dependency structure and make inferences, the fitted model was used to first predict surface temperature in the target population, since it depended only on the fully observed covariates. That predictive distribution was then used alongside the relevant fully observed covariates to predict NO3 in the target population. Finally, the predictive distributions for termperature and NO3 were used to predict the N2O responses. These steps were carried out in the “Predict to population” section to follow.

In the final model below, the submodel for surface temperature assumed a Gamma distributed error distribution and the linear predictor included the survey design variables, latitude, elevation, and julian date. The shape parameter was also modeled as a function of latitude to address increasing response variance along the latitudinal gradient. The NO3 submodel was a cumulative logit formulation and the linear predictor included all of the survey factors as well as surface temperature and lake area.

The N2O and N2O-eq responses were each modeled with Gamma distributed errors, but with the same covariate structure as in model 5. The same structure was also employed for the shape terms in these responses, corresponding to the \(\sigma\) terms in the previous model. Though not shown in this document, the Gamma error structure appeared to result in slightly better performance in the predictive checks compared to the Gaussian errors in previous models. This was primarily apparent in the saturation ratio checks, which may have been more sensitive to model performance in the tails of the N2O responses. Others have also indicated that the Gamma error distribution can work well for dissolved N2O data (Webb et al. 2019).

Note that there was no residual correlation term for this model, since the residuals are undefined for the Gamma and cumulative logit models. Dropping the observation-level residual correlation term was deemed a reasonable compromise that enabled modeling the missingness of NO3, in particular. Nevertheless, the random intercepts again allowed for potential correlations between responses at the group levels.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

bf_n2o <- bf(n2o ~ mo(no3_cat) +
               log_area +
               surftemp + 
               mo(no3_cat):log_area +
               mo(no3_cat):surftemp +
               (mo(no3_cat) | a | WSA9) + 
               (mo(no3_cat) | b | WSA9:state) + 
               (1 | c | WSA9:state:size_cat),
             shape ~ log_area +
               mo(no3_cat) +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat),
             family = Gamma(link = "log"))

bf_n2oeq <- bf(n2o_eq ~ surftemp +
                 log_elev +
                 surftemp:log_elev +
                 (1 | a | WSA9) + 
                 (1 | b | WSA9:state) +
                 (1 | c | WSA9:state:size_cat),
             shape ~ surftemp +
               log_elev +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat),
             family = Gamma(link = "log"))

bf_temp <- bf(surftemp ~ lat +
                s(log_elev) +
                s(jdate) +
                (1 | a | WSA9) + 
                (1 | b | WSA9:state) +
                (1 | c | WSA9:state:size_cat),
              shape ~ lat,
              family = Gamma(link = "log"))

bf_no3 <- bf(no3_cat ~ surftemp +
               log_area +
               (1 | a | WSA9) +
               (1 | b | WSA9:state) +
               (1 | c | WSA9:state:size_cat),
             family = cumulative(link = "logit", threshold="flexible"))

priors <- c(
  prior(normal(2, 1), class = "Intercept", resp = "n2o"),
  prior(normal(0, 1), class = "b", resp = "n2o"),
  prior(exponential(2), class = "sd", resp = "n2o"),
  prior(normal(5, 4), class = "Intercept", dpar = "shape", resp = "n2o"),
  prior(normal(0, 1), class = "b", dpar = "shape", resp = "n2o"),
  prior(exponential(2), class = "sd", dpar = "shape", resp = "n2o"),
  
  prior(normal(2, 1), class = "Intercept", resp = "n2oeq"), 
  prior(normal(0, 1), class = "b", resp = "n2oeq"),  
  prior(exponential(2), class = "sd", resp = "n2oeq"),
  prior(normal(5, 4), class = "Intercept", dpar = "shape", resp = "n2oeq"),
  prior(normal(0, 1), class = "b", dpar = "shape", resp = "n2oeq"),
  prior(exponential(2), class = "sd", dpar = "shape", resp = "n2oeq"),
  
  prior(normal(3, 1), class = "Intercept", resp = "surftemp"), 
  prior(normal(0, 1), class = "b", resp = "surftemp"), 
  prior(exponential(0.5), class = "sds", resp = "surftemp"),
  prior(exponential(2), class = "sd", resp = "surftemp"),
  prior(normal(5, 4), class = "Intercept", dpar = "shape", resp = "surftemp"),
  prior(normal(0, 1), class = "b", dpar = "shape", resp = "surftemp"),
  
  prior(normal(0, 3), class = "Intercept", resp = "no3cat"),
  prior(normal(0, 1), class = "b", resp = "no3cat"),
  prior(exponential(1), class = "sd", resp = "no3cat"),
  
  prior(lkj(2), class = "cor")
  )

n2o_mod6 <- brm(bf_n2o + 
                  bf_n2oeq + 
                  bf_temp + 
                  bf_no3 + 
                  set_rescor(rescor = FALSE),
                data = df_model, 
                prior = priors,
  control = list(adapt_delta = 0.975, max_treedepth = 14),
  #sample_prior = "only",
  save_pars = save_pars(all = TRUE),
  seed = 85132,#14548,
  #init = my_inits,
  init_r = 0.5,
  chains=4, 
  iter=5000, 
  cores=4)

save(n2o_mod6, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")

3.6.1 Summarize fit

Below is a summary of the fitted parameters and MCMC diagnostics.

 Family: MV(gamma, gamma, gamma, cumulative) 
  Links: mu = log; shape = log
         mu = log; shape = log
         mu = log; shape = log
         mu = logit; disc = identity 
Formula: n2o ~ mo(no3_cat) + log_area + surftemp + mo(no3_cat):log_area + mo(no3_cat):surftemp + (mo(no3_cat) | a | WSA9) + (mo(no3_cat) | b | WSA9:state) + (1 | c | WSA9:state:size_cat) 
         shape ~ log_area + mo(no3_cat) + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
         n2o_eq ~ surftemp + log_elev + surftemp:log_elev + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat) 
         shape ~ surftemp + log_elev + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
         surftemp ~ lat + s(log_elev) + s(jdate) + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat) 
         shape ~ lat
         no3_cat ~ surftemp + log_area + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat) 
   Data: df_model (Number of observations: 984) 
  Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 10000

Priors: 
b_n2o ~ normal(0, 1)
b_n2o_shape ~ normal(0, 1)
b_n2oeq ~ normal(0, 1)
b_n2oeq_shape ~ normal(0, 1)
b_no3cat ~ normal(0, 1)
b_surftemp ~ normal(0, 1)
b_surftemp_shape ~ normal(0, 1)
Intercept_n2o ~ normal(2, 1)
Intercept_n2o_shape ~ normal(5, 4)
Intercept_n2oeq ~ normal(2, 1)
Intercept_n2oeq_shape ~ normal(5, 4)
Intercept_no3cat ~ normal(0, 3)
Intercept_surftemp ~ normal(3, 1)
Intercept_surftemp_shape ~ normal(5, 4)
L ~ lkj_corr_cholesky(2)
<lower=0> sd_n2o ~ exponential(2)
<lower=0> sd_n2o_shape ~ exponential(2)
<lower=0> sd_n2oeq ~ exponential(2)
<lower=0> sd_n2oeq_shape ~ exponential(2)
<lower=0> sd_no3cat ~ exponential(1)
<lower=0> sd_surftemp ~ exponential(2)
<lower=0> sds_surftemp ~ exponential(0.5)
simo_n2o_mono3_cat:log_area1 ~ dirichlet(1)
simo_n2o_mono3_cat:surftemp1 ~ dirichlet(1)
simo_n2o_mono3_cat1 ~ dirichlet(1)
simo_n2o_shape_mono3_cat1 ~ dirichlet(1)

Smooth Terms: 
                          Estimate Est.Error l-95% CI u-95% CI  Rhat Bulk_ESS Tail_ESS
sds(surftemp_slog_elev_1)    1.161     0.370    0.638    2.079 1.001     2264     4174
sds(surftemp_sjdate_1)       0.571     0.277    0.226    1.273 1.000     2679     4853

Group-Level Effects: 
~WSA9 (Number of levels: 9) 
                                         Estimate Est.Error l-95% CI u-95% CI  Rhat
sd(n2o_Intercept)                           0.048     0.018    0.020    0.091 1.000
sd(n2o_mono3_cat)                           0.045     0.035    0.002    0.129 1.003
sd(n2oeq_Intercept)                         0.033     0.011    0.018    0.061 1.000
sd(surftemp_Intercept)                      0.031     0.014    0.011    0.064 1.001
sd(no3cat_Intercept)                        0.690     0.256    0.296    1.308 1.000
sd(shape_n2o_Intercept)                     0.211     0.141    0.011    0.535 1.000
sd(shape_n2oeq_Intercept)                   0.402     0.183    0.075    0.814 1.001
cor(n2o_Intercept,n2o_mono3_cat)           -0.049     0.336   -0.667    0.614 1.000
cor(n2o_Intercept,n2oeq_Intercept)          0.393     0.268   -0.202    0.824 1.000
cor(n2o_mono3_cat,n2oeq_Intercept)          0.102     0.328   -0.559    0.686 1.000
cor(n2o_Intercept,surftemp_Intercept)      -0.350     0.296   -0.839    0.285 1.000
cor(n2o_mono3_cat,surftemp_Intercept)       0.089     0.333   -0.564    0.703 1.000
cor(n2oeq_Intercept,surftemp_Intercept)    -0.167     0.299   -0.705    0.437 1.000
cor(n2o_Intercept,no3cat_Intercept)        -0.057     0.295   -0.605    0.522 1.000
cor(n2o_mono3_cat,no3cat_Intercept)         0.141     0.333   -0.539    0.724 1.002
cor(n2oeq_Intercept,no3cat_Intercept)       0.272     0.274   -0.305    0.740 1.001
cor(surftemp_Intercept,no3cat_Intercept)    0.185     0.300   -0.436    0.718 1.000
                                         Bulk_ESS Tail_ESS
sd(n2o_Intercept)                            3011     2931
sd(n2o_mono3_cat)                            1540     3502
sd(n2oeq_Intercept)                          3147     4572
sd(surftemp_Intercept)                       3705     4498
sd(no3cat_Intercept)                         4184     5618
sd(shape_n2o_Intercept)                      2379     3641
sd(shape_n2oeq_Intercept)                    2208     1617
cor(n2o_Intercept,n2o_mono3_cat)             6558     6134
cor(n2o_Intercept,n2oeq_Intercept)           4604     6010
cor(n2o_mono3_cat,n2oeq_Intercept)           3241     5029
cor(n2o_Intercept,surftemp_Intercept)        4744     5916
cor(n2o_mono3_cat,surftemp_Intercept)        5246     6447
cor(n2oeq_Intercept,surftemp_Intercept)      7342     7508
cor(n2o_Intercept,no3cat_Intercept)          5152     6479
cor(n2o_mono3_cat,no3cat_Intercept)          3435     5897
cor(n2oeq_Intercept,no3cat_Intercept)        6449     7400
cor(surftemp_Intercept,no3cat_Intercept)     6490     7928

~WSA9:state (Number of levels: 96) 
                                         Estimate Est.Error l-95% CI u-95% CI  Rhat
sd(n2o_Intercept)                           0.047     0.013    0.021    0.072 1.004
sd(n2o_mono3_cat)                           0.101     0.019    0.068    0.142 1.001
sd(n2oeq_Intercept)                         0.033     0.003    0.026    0.040 1.001
sd(surftemp_Intercept)                      0.035     0.006    0.023    0.047 1.000
sd(no3cat_Intercept)                        0.878     0.128    0.649    1.147 1.000
sd(shape_n2o_Intercept)                     0.383     0.178    0.032    0.707 1.004
sd(shape_n2oeq_Intercept)                   0.561     0.114    0.336    0.781 1.002
cor(n2o_Intercept,n2o_mono3_cat)           -0.145     0.254   -0.602    0.376 1.002
cor(n2o_Intercept,n2oeq_Intercept)          0.183     0.189   -0.180    0.549 1.003
cor(n2o_mono3_cat,n2oeq_Intercept)          0.201     0.164   -0.122    0.519 1.002
cor(n2o_Intercept,surftemp_Intercept)      -0.023     0.237   -0.476    0.438 1.004
cor(n2o_mono3_cat,surftemp_Intercept)      -0.232     0.215   -0.644    0.194 1.001
cor(n2oeq_Intercept,surftemp_Intercept)    -0.134     0.190   -0.494    0.246 1.000
cor(n2o_Intercept,no3cat_Intercept)         0.462     0.211    0.018    0.822 1.004
cor(n2o_mono3_cat,no3cat_Intercept)         0.145     0.200   -0.251    0.531 1.000
cor(n2oeq_Intercept,no3cat_Intercept)       0.054     0.140   -0.220    0.329 1.000
cor(surftemp_Intercept,no3cat_Intercept)   -0.231     0.189   -0.586    0.154 1.001
                                         Bulk_ESS Tail_ESS
sd(n2o_Intercept)                            1025     1337
sd(n2o_mono3_cat)                            3366     4975
sd(n2oeq_Intercept)                          2809     4611
sd(surftemp_Intercept)                       4124     5116
sd(no3cat_Intercept)                         4389     6247
sd(shape_n2o_Intercept)                       649     1304
sd(shape_n2oeq_Intercept)                    1908     1810
cor(n2o_Intercept,n2o_mono3_cat)             1069     1955
cor(n2o_Intercept,n2oeq_Intercept)            794     1260
cor(n2o_mono3_cat,n2oeq_Intercept)            833     1873
cor(n2o_Intercept,surftemp_Intercept)        1849     3618
cor(n2o_mono3_cat,surftemp_Intercept)        2626     4267
cor(n2oeq_Intercept,surftemp_Intercept)      6340     6860
cor(n2o_Intercept,no3cat_Intercept)           893     1944
cor(n2o_mono3_cat,no3cat_Intercept)          1746     3085
cor(n2oeq_Intercept,no3cat_Intercept)        6378     7983
cor(surftemp_Intercept,no3cat_Intercept)     3573     5949

~WSA9:state:size_cat (Number of levels: 352) 
                                         Estimate Est.Error l-95% CI u-95% CI  Rhat
sd(n2o_Intercept)                           0.038     0.014    0.006    0.064 1.011
sd(n2oeq_Intercept)                         0.004     0.002    0.000    0.008 1.009
sd(surftemp_Intercept)                      0.010     0.007    0.000    0.024 1.003
sd(no3cat_Intercept)                        0.308     0.182    0.016    0.673 1.002
sd(shape_n2o_Intercept)                     0.895     0.110    0.684    1.114 1.002
sd(shape_n2oeq_Intercept)                   0.511     0.104    0.309    0.715 1.002
cor(n2o_Intercept,n2oeq_Intercept)          0.310     0.349   -0.486    0.841 1.004
cor(n2o_Intercept,surftemp_Intercept)       0.015     0.365   -0.677    0.705 1.000
cor(n2oeq_Intercept,surftemp_Intercept)    -0.102     0.381   -0.761    0.659 1.001
cor(n2o_Intercept,no3cat_Intercept)        -0.090     0.342   -0.716    0.601 1.001
cor(n2oeq_Intercept,no3cat_Intercept)      -0.143     0.359   -0.757    0.611 1.001
cor(surftemp_Intercept,no3cat_Intercept)    0.173     0.378   -0.599    0.807 1.001
                                         Bulk_ESS Tail_ESS
sd(n2o_Intercept)                             580     1036
sd(n2oeq_Intercept)                          1062     3160
sd(surftemp_Intercept)                       1917     3889
sd(no3cat_Intercept)                         1078     2386
sd(shape_n2o_Intercept)                      1068     2786
sd(shape_n2oeq_Intercept)                    1397     2071
cor(n2o_Intercept,n2oeq_Intercept)           2099     4705
cor(n2o_Intercept,surftemp_Intercept)        4876     6414
cor(n2oeq_Intercept,surftemp_Intercept)      4195     5741
cor(n2o_Intercept,no3cat_Intercept)          3188     5334
cor(n2oeq_Intercept,no3cat_Intercept)        2749     4946
cor(surftemp_Intercept,no3cat_Intercept)     2607     5750

Population-Level Effects: 
                         Estimate Est.Error l-95% CI u-95% CI  Rhat Bulk_ESS Tail_ESS
n2o_Intercept               2.392     0.056    2.285    2.500 1.000     3063     5517
shape_n2o_Intercept         3.215     0.189    2.849    3.589 1.001     3997     6108
n2oeq_Intercept             3.111     0.053    3.011    3.218 1.000     5146     6075
shape_n2oeq_Intercept       3.873     0.751    2.385    5.326 1.001     5619     6535
surftemp_Intercept          3.791     0.060    3.672    3.906 1.000     7414     7632
shape_surftemp_Intercept    8.637     0.460    7.721    9.522 1.001    10817     7718
no3cat_Intercept[1]        -3.025     0.615   -4.258   -1.864 1.000     5932     6622
no3cat_Intercept[2]        -2.059     0.605   -3.274   -0.903 1.000     6278     6695
no3cat_Intercept[3]        -1.027     0.600   -2.235    0.115 1.000     6689     6957
no3cat_Intercept[4]        -0.028     0.602   -1.247    1.113 1.000     7002     6940
n2o_log_area                0.028     0.003    0.022    0.034 1.000     6958     7668
n2o_surftemp               -0.025     0.002   -0.029   -0.021 1.000     3145     6260
shape_n2o_log_area          0.190     0.041    0.110    0.271 1.000     4322     5605
n2oeq_surftemp             -0.042     0.002   -0.045   -0.038 1.000     5836     6494
n2oeq_log_elev             -0.080     0.008   -0.097   -0.064 1.000     5425     6234
n2oeq_surftemp:log_elev     0.002     0.000    0.002    0.003 1.000     5857     6696
shape_n2oeq_surftemp        0.131     0.020    0.092    0.171 1.001     6725     7112
shape_n2oeq_log_elev        0.030     0.077   -0.117    0.186 1.001     4914     6636
surftemp_lat               -0.016     0.001   -0.019   -0.013 1.000     7349     7598
shape_surftemp_lat         -0.105     0.011   -0.127   -0.083 1.000    11330     7919
no3cat_surftemp            -0.141     0.023   -0.187   -0.096 1.001     6713     7328
no3cat_log_area             0.068     0.035   -0.001    0.137 1.000    11692     8177
surftemp_slog_elev_1       -3.477     0.479   -4.414   -2.557 1.000     5891     6330
surftemp_sjdate_1          -0.008     0.528   -1.074    1.017 1.000     4010     5523
n2o_mono3_cat               0.007     0.087   -0.172    0.175 1.004     1650     1855
n2o_mono3_cat:log_area     -0.046     0.009   -0.063   -0.027 1.001     2079     2821
n2o_mono3_cat:surftemp      0.018     0.004    0.009    0.026 1.004     1417     1621
shape_n2o_mono3_cat        -0.510     0.070   -0.649   -0.375 1.001     4691     6283

Simplex Parameters: 
                           Estimate Est.Error l-95% CI u-95% CI  Rhat Bulk_ESS Tail_ESS
n2o_mono3_cat1[1]             0.025     0.023    0.001    0.083 1.000     4899     5237
n2o_mono3_cat1[2]             0.183     0.111    0.014    0.425 1.002     1056     3343
n2o_mono3_cat1[3]             0.408     0.158    0.103    0.725 1.002     2167     4583
n2o_mono3_cat1[4]             0.384     0.169    0.058    0.713 1.003     1608     3021
n2o_mono3_cat:log_area1[1]    0.046     0.029    0.003    0.112 1.000     4018     3309
n2o_mono3_cat:log_area1[2]    0.033     0.028    0.001    0.104 1.001     6173     4948
n2o_mono3_cat:log_area1[3]    0.289     0.134    0.057    0.598 1.000     3286     4461
n2o_mono3_cat:log_area1[4]    0.631     0.140    0.301    0.864 1.000     3186     4184
n2o_mono3_cat:surftemp1[1]    0.028     0.019    0.003    0.060 1.001     3744     3410
n2o_mono3_cat:surftemp1[2]    0.066     0.033    0.011    0.128 1.001     3821     3213
n2o_mono3_cat:surftemp1[3]    0.281     0.079    0.122    0.431 1.003     2897     2971
n2o_mono3_cat:surftemp1[4]    0.625     0.088    0.461    0.798 1.003     2740     3106
shape_n2o_mono3_cat1[1]       0.116     0.067    0.009    0.263 1.001     5363     4297
shape_n2o_mono3_cat1[2]       0.149     0.096    0.008    0.365 1.001     3469     3637
shape_n2o_mono3_cat1[3]       0.401     0.151    0.116    0.703 1.001     4122     5848
shape_n2o_mono3_cat1[4]       0.334     0.143    0.051    0.604 1.001     3853     4561

Family Specific Parameters: 
            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc_no3cat    1.000     0.000    1.000    1.000   NA       NA       NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

3.6.2 Model checks

Below, the same PPCs for N2O and N2O-eq were employed as before. #### N2O PPC The PPCs for N2O from this model were similarly reasonable as for models 4 and 5 above.

Using all posterior draws for ppc type 'stat_2d' by default.
Using all posterior draws for ppc type 'stat_2d' by default.
Using all posterior draws for ppc type 'stat_2d' by default.
Using all posterior draws for ppc type 'scatter_avg' by default.

3.6.2.1 Equilibrium N2O PPC

Again, the PPCs for N2O-eq in this model were similar to those for models 4 and 5.

Using all posterior draws for ppc type 'stat_2d' by default.
Using all posterior draws for ppc type 'stat_2d' by default.
Using all posterior draws for ppc type 'stat_2d' by default.
Using all posterior draws for ppc type 'scatter_avg' by default.

3.6.2.2 Bivariate PPC

This model again provided a very reasonable representation of the bivariate relationship between N2O and N2O-eq (below).

3.6.2.3 Saturation PPC

The saturation ratio PPCs below show similar behavior as with models 4 and 5 above, but with perhaps slightly less bias in the predictions for the proportion of undersaturated waterbodies and fewer extreme predictions for the means and standard deviations. The observed vs. predicted PPC also appears to have a better behaved variance and no extreme predictions, compared to models 4 and 5 with the lognormal errors.

The plot below shows the same PPC, but for the “test” or second-vist data. Overall, the model looked to perform similarly as with the data used to fit it.

3.6.2.4 R-square

Below are estimates for the Bayesian \(R^2\), which were largely similar for N2O and N2O-eq as with models 4 and 5 above. The \(R^2\) for the surface temperature response also suggested a fairly good fit.

      Estimate Est.Error  Q2.5 Q97.5
R2n2o    0.646     0.059 0.503 0.731
        Estimate Est.Error  Q2.5 Q97.5
R2n2oeq    0.863     0.006 0.851 0.874
           Estimate Est.Error  Q2.5 Q97.5
R2surftemp    0.744      0.01 0.723 0.763

Below are the same \(R^2\) estimates, but for the second-visit data. That these estimates are similar to those for the data used to fit the model, suggesting that the model may perform similarly well out-of-sample.

      Estimate Est.Error  Q2.5 Q97.5
R2n2o    0.607     0.137 0.325  0.85
        Estimate Est.Error Q2.5 Q97.5
R2n2oeq    0.857     0.008 0.84 0.872
           Estimate Est.Error  Q2.5 Q97.5
R2surftemp     0.75     0.018 0.715 0.783

3.6.3 Covariate effects

3.6.3.1 N2O

The conditional effects plot for the covariate effects N2O suggested a similar effect of NO3, but interesting interactions between NO3 and lake area and NO3 and surface temperature. For lake area, the effect was estimated to be larger and more negative at the highest levels of NO3; and slightly negative at the lowest level of NO3. For surface temperature, the effect was estimated to be largest and positive at the highest level of NO3; and negative at the lowest level of NO3.

The estimated covariate effects on \(\sigma\) suggested a negative relationship with log(area) and a positive relationship, again, with NO3.

3.6.3.2 Equilibrium N2O

The estimated covariate effect on N2O remained largely the same as estimated in the previous model.

4 Predict to population

As previously described, in order to make inferences to the population of interest, the final model above was used to, first, predict surface temperature in the target population, since it depended only on the fully observed covariates. Next, the predictive distribution for surface temperature was used, along with the relevant fully observed covariates, to predict NO3 in the target population. Finally, the predictive distributions for temperature and NO3 were used to predict the N2O responses. The code for these steps is outlined in the following.

The first step used the final model to predict to the population:

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")

predict_temp <- sframe %>%
  mutate(jdate = 205) %>%
  add_predicted_draws(n2o_mod6, resp=c("surftemp"), 
                      allow_new_levels = TRUE, 
                      cores =1, 
                      ndraws = 500) %>%
  mutate(surftemp = .prediction)

save(predict_temp, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_temp.rda")

NO3 was next predicted. Note that the posterior predictive distribution for NO3 was subsampled in order to minimize excess simulations

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_temp.rda")

temp_X <- predict_temp %>% # select relevant columns as predictors
  ungroup() %>%
  select(WSA9,
         state,
         size_cat,
         log_area,
         .row,
         .draw,
         surftemp) %>%
  select(WSA9, state, size_cat, log_area, surftemp)


rm(predict_temp) # reduce memory
gc()

# set number of cores to use for parallel predictions
# and register the workers
cl <- parallel::makeCluster(5) 
doSNOW::registerDoSNOW(cl) 

# make a progress bar
pb <- txtProgressBar(max = 1500, style = 3)
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress = progress)

system.time( # approx 26 hrs with 5 workers & 500 draws from PPD
predict_no3 <- foreach(sub_X = isplitRows(temp_X, chunkSize = 155299), 
                       .combine = 'c',
                       .packages = c("brms"),
                       .options.snow = opts
                       ) %dopar% {
                         apply(brms::posterior_predict(n2o_mod6,
                                                 newdata = sub_X,
                                                 resp = "no3cat",
                                                 allow_new_levels = T,
                                                 ndraws = 500,
                                                 cores = 1), 2, sample, 1)
                         }
)


close(pb)
parallel::stopCluster(cl)

save(predict_no3, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_no3.rda")

Finally, N2O and N2O-eq were predicted using the surface temperature and nitrate predictions along with the survey variables and known covariates. Again, the posterior was subsampled in order to reduce excess simulations.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_no3.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_temp.rda")

# Assemble dataframe containing relevant covariates (known and predicted)
n2o_X <- predict_temp %>%
  ungroup() %>%
  mutate(no3_cat = predict_no3) %>%
  select(WSA9,
         state,
         size_cat,
         log_area,
         surftemp,
         log_elev,
         no3_cat)

# clear objects to reduce memory overhead
rm(predict_no3, predict_temp) 
gc()

# save the predictors for n2o and n2oeq
save(n2o_X, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_X.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_X.rda")

# set number of cores to use for parallel predictions
# and register the workers
cl <- parallel::makeCluster(6) 
doSNOW::registerDoSNOW(cl) 

# make a progress bar
pb <- txtProgressBar(max = 1500, style = 3)
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress = progress)

# make predictions in parallel
system.time(
predict_n2o <- foreach(sub_X = isplitRows(n2o_X, chunkSize = 155299),
                 .combine = rbind,
                 .options.snow = opts,
                 .packages = c("brms")) %dopar% {
  apply(posterior_predict(n2o_mod6,
                          newdata = sub_X,
                          resp = c("n2o", "n2oeq"),
                          allow_new_levels = T,
                          ndraws = 500,
                          cores = 1),
        2, sample, 1)
                   }
)

close(pb)
parallel::stopCluster(cl)

colnames(predict_n2o) <- c("n2o", "n2oeq")

save(predict_n2o, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_n2o.rda")

Finally, the predictions for all four partially observed responses were assembled into a new dataframe for use in inference.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_n2o.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_no3.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_temp.rda")

all_predictions <- predict_temp %>%
  ungroup() %>%
  mutate(no3cat = predict_no3) %>%
  bind_cols(predict_n2o) %>%
  mutate(n2osat = n2o / n2oeq, # calculate saturation ratio
         .row = rep(1:465897, each = 500),
         .draw = rep(seq(1,500, 1), 465897)) %>%
  mutate(area_ha = exp(log_area)) %>% # include area on ha scale
  select(WSA9,
         state,
         size_cat,
         area_ha,
         lat,
         lon,
         .row,
         .draw,
         surftemp,
         no3cat,
         n2o,
         n2oeq,
         n2osat)

rm(predict_n2o, predict_temp, predict_no3) # clean up workspace for RAM
gc()
 

save(all_predictions, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

5 Population estimates

A number of estimates for the target population were assembled and presented below. First, the full posterior predictive distributions for dissolved N2O, equilibrium N2o, and the saturation ratio were assessed. These distributions summarized the predicted distribution of concentrations or ratios for all lakes in the population of interest and included parameter uncertainty propagated through the model. Next, population means were assessed, followed by comparisons of some model-based estimates to previously calculated design-based estimates.

5.1 Posterior predictive distributions

Below, a density plot summarized the posterior predictive distribution of N2O and N2O-eq concentrations across the target population of lakes, based on 500 draws from the posterior predictive distribution. Note that the x-axis was truncated at 50 nmol/L for a clearer visualization of the bulk of the predicted distribution. For reference, the max predicted value was 4403.2 nmol/L for dissolved N2O, 20.4 nmol/L for dissolved N2O, and 793.5 for the saturation ratio.

5.2 Estimated means

5.2.1 National

Below are density plots summarizing the posterior distribution of means for N2O concentrations and the saturation ratio for the target population (i.e., all US lakes > 1ha in the lower 48 states).

To illustrate the skewness in the predictive distribution for the saturation ratio, an estimate for the median ratio is shown below. The entire posterior distribution of the mean above is larger than 1, the ratio representing the boundary of under- vs. oversaturation. By comparison, the posterior estimate of the median below only included values less than one, suggesting that though the mean saturation ratio was greater than 1, most lakes in the national populaiton were undersaturated (i.e., ratio less than 1). In distributions with right-skew, the mean can often be considerably larger than the median.

Below is a plot of the posterior mean estimate for the proportion of unsaturated lakes at the national scale.

5.2.2 Ecoregion

Below are posterior estimates of the means for dissolved and equilibrium N2O and the saturation ratio by WSA9 ecoregion.

A plot of the posterior estimates for the median saturation ratio below indicated, again, that most lakes in each ecoregion were undersaturated (i.e., median << 1).

A plot of the estimates of the proportion of under-saturated lakes by ecoregion is below. A plot of the posterior estimates for the median saturation ratio below indicated, again, that most lakes in each ecoregion were undersaturated (i.e., median << 1).

5.2.3 State

Comparisons of mean estimates (posterior median, upper and lower 95th percentiles) by state are below. Density estimates were not included to minimize plot space.

Below, a plot of estimates for the mean (black circles) and median (grey circles) saturation ratio by state. A horizontal, dashed, black line is shown at ratio = 1, indicating the boundary for under- vs. oversaturation. Only a few states (e.g., NV, DE) had median estimates that were 1 or greater, suggesting that, for most states, most lakes were undersaturated.

Finally, a plot of the estimated proportion of undersaturated lakes for each state in the target population. Point estimates are the posterior median of the proportion and bars are the upper and lower boundaries of the central 95th percentile of the posterior distributions of proportions.

5.2.4 Size category

The estimated means by size category are below for dissolved and equilibrium N2O and the saturation ratio. Median estimates for the saturation ratio are also shown.

Mean vs. median below.

And, finally, the estimated proportion of undersaturated lakes in the target population by size category

5.3 Model- vs. design-based

Below, estimates from the model-based approach are compared to design-based estimates. In general, the model estimates were similar to the design-based estimates. Model estimates were typically within the confidence bounds of the design-based estimates, but with much greater precision. Improved precision was expected due to the “shrinkage” induced by the multilevel parameterization, which allowed some “borrowing” of information across the various levels of the survey factors.

5.3.1 Dissolved N2O

Below, National mean estimates for dissolved N2O from the model and design-based approaches were compared. The sample-based estimate was also included as a naive reference.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_survey_ests.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>%
  summarise(mean_n2o = mean(n2o)) %>%
  summarise(estimate = round(median(mean_n2o), 2), # posterior median
    LCL = round(quantile(mean_n2o, probs = 0.025), 2),
    UCL = round(quantile(mean_n2o, probs = 0.975), 2)) %>% 
  mutate(type = "model") %>%
  bind_rows(cbind(n2o_survey_ests[10, 2:4], type = rep("survey", 1))) %>%
  add_row(estimate = round(mean(df_model$n2o), 2),
          type = "sample") %>%
  print()

The black, vertical, dashed line in the figure below represents the mean of the sample.

Below, estimates were compared by ecoregion.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_survey_ests.rda")

all_predictions %>%
  group_by(WSA9, .draw) %>%
  summarise(mean_n2o = mean(n2o)) %>%
  group_by(WSA9, .groups = "drop") %>%
  summarise(estimate = round(median(mean_n2o), 2),
    LCL = round(quantile(mean_n2o, probs = 0.025), 2),
    UCL = round(quantile(mean_n2o, probs = 0.975), 2),
    .groups = "drop") %>% 
  mutate(ecoregion = factor(WSA9)) %>%
  mutate(type = "model") %>%
  select(ecoregion, estimate, LCL, UCL, type) %>%
  mutate(ecoregion = forcats::fct_reorder(ecoregion, estimate)) %>%
  bind_rows(cbind(n2o_survey_ests[-10,], type = rep("survey", 9))) %>%
  arrange(ecoregion) %>%
  print()
`summarise()` has grouped output by 'WSA9'. You can override using the `.groups`
argument.

Means were compared according to size categories below.

5.3.2 Saturation

Below, the same comparisons were made for the saturation estimates.

load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sat_survey_ests.rda")

all_predictions %>%
  group_by(.draw) %>%
  summarise(mean_sat = mean(n2osat), .groups = "drop") %>%
  summarise(estimate = round(median(mean_sat), 3),
    LCL = round(quantile(mean_sat, probs = 0.025), 3),
    UCL = round(quantile(mean_sat, probs = 0.975), 3),
    .groups = "drop") %>% 
  mutate(type = "model") %>%
  bind_rows(cbind(sat_survey_ests[10, 2:4], type = rep("survey", 1))) %>%
  add_row(estimate = round(mean(df_model$n2o / df_model$n2o_eq), 3),
          type = "sample") %>%
  print()

6 References

7 Session Info

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] brms_2.18.0      Rcpp_1.0.9       tidybayes_3.0.2  bayesplot_1.9.0  itertools_0.1-3 
 [6] iterators_1.0.14 foreach_1.5.2    future_1.28.0    forcats_0.5.2    stringr_1.5.0   
[11] purrr_0.3.5      readr_2.1.3      tidyr_1.2.1      tibble_3.1.8     tidyverse_1.3.2 
[16] dplyr_1.0.10     ggrepel_0.9.2    kableExtra_1.3.4 gridExtra_2.3    ggExtra_0.10.0  
[21] moments_0.14.1   ggpubr_0.4.0     ggplot2_3.4.0   

loaded via a namespace (and not attached):
  [1] readxl_1.4.1         backports_1.4.1      systemfonts_1.0.4    plyr_1.8.8          
  [5] igraph_1.3.5         svUnit_1.0.6         splines_4.1.2        listenv_0.8.0       
  [9] crosstalk_1.2.0      rstantools_2.2.0     inline_0.3.19        digest_0.6.31       
 [13] htmltools_0.5.4      fansi_1.0.3          magrittr_2.0.3       checkmate_2.1.0     
 [17] DHARMa_0.4.6         googlesheets4_1.0.1  tzdb_0.3.0           globals_0.16.1      
 [21] modelr_0.1.9         RcppParallel_5.1.5   matrixStats_0.62.0   xts_0.12.2          
 [25] svglite_2.1.0        timechange_0.1.1     prettyunits_1.1.1    colorspace_2.0-3    
 [29] rvest_1.0.3          ggdist_3.2.0         haven_2.5.1          xfun_0.35           
 [33] callr_3.7.3          crayon_1.5.2         jsonlite_1.8.4       lme4_1.1-31         
 [37] zoo_1.8-11           glue_1.6.2           gtable_0.3.1         gargle_1.2.1        
 [41] webshot_0.5.4        V8_4.2.1             distributional_0.3.1 car_3.1-1           
 [45] pkgbuild_1.3.1       rstan_2.26.11        abind_1.4-5          scales_1.2.1        
 [49] mvtnorm_1.1-3        DBI_1.1.3            rstatix_0.7.0        miniUI_0.1.1.1      
 [53] isoband_0.2.6        viridisLite_0.4.1    xtable_1.8-4         diffobj_0.3.5       
 [57] stats4_4.1.2         StanHeaders_2.26.11  DT_0.26              htmlwidgets_1.6.0   
 [61] httr_1.4.4           threejs_0.3.3        arrayhelpers_1.1-0   posterior_1.3.1     
 [65] ellipsis_0.3.2       pkgconfig_2.0.3      loo_2.5.1            farver_2.1.1        
 [69] dbplyr_2.2.1         utf8_1.2.2           labeling_0.4.2       tidyselect_1.2.0    
 [73] rlang_1.0.6          reshape2_1.4.4       later_1.3.0          munsell_0.5.0       
 [77] cellranger_1.1.0     tools_4.1.2          cli_3.4.1            generics_0.1.3      
 [81] broom_1.0.1          ggridges_0.5.4       evaluate_0.19        fastmap_1.1.0       
 [85] processx_3.8.0       knitr_1.41           fs_1.5.2             nlme_3.1-161        
 [89] mime_0.12            xml2_1.3.3           compiler_4.1.2       shinythemes_1.2.0   
 [93] rstudioapi_0.14      curl_4.3.3           ggsignif_0.6.3       reprex_2.0.2        
 [97] stringi_1.7.8        ps_1.7.2             Brobdingnag_1.2-9    lattice_0.20-45     
[101] Matrix_1.5-3         nloptr_2.0.3         markdown_1.1         shinyjs_2.1.0       
[105] tensorA_0.36.2       vctrs_0.5.1          pillar_1.8.1         lifecycle_1.0.3     
[109] bridgesampling_1.1-2 cowplot_1.1.1        httpuv_1.6.6         R6_2.5.1            
[113] promises_1.2.0.1     renv_0.16.0          parallelly_1.32.1    codetools_0.2-18    
[117] boot_1.3-28.1        colourpicker_1.1.1   MASS_7.3-58.1        gtools_3.9.4        
[121] assertthat_0.2.1     withr_2.5.0          shinystan_2.6.0      mgcv_1.8-41         
[125] parallel_4.1.2       hms_1.1.2            grid_4.1.2           coda_0.19-4         
[129] minqa_1.2.5          rmarkdown_2.19       carData_3.0-5        googledrive_2.0.0   
[133] shiny_1.7.1          lubridate_1.9.0      base64enc_0.1-3      dygraphs_1.1.1.6    
Bürkner, Paul-Christian. 2017. “Brms: An r Package for Bayesian Multilevel Models Using Stan.” Journal Article. 2017 80 (1): 28. https://doi.org/10.18637/jss.v080.i01.
Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2014. Bayesian Data Analysis. Book. 4th ed. New York: CRC Press.
Gelman, Andrew, Jennifer Hill, and Aki. Vehtari. 2020. Regression and Other Stories: Analytical Methods for Social Research. Book. 1st ed. Cambridge: CRC Press.
Link, William A., and Richard J. Barker. 2010. Bayesian Inference: With Ecological Applications. Book. 1st ed. Boston: Academic Press.
McElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in r and Stan. Book. 2nd ed. Boca Raton: CRC Press.
Merkle, Edgar C., Ellen Fitzsimmons, James Uanhoro, and Ben Goodrich. 2021. “Efficient Bayesian Structural Equation Modeling in Stan.” Journal Article. Journal of Statistical Software 100 (6): 1–22. https://doi.org/https://doi.org/10.18637/jss.v100.i06.
Merkle, Edgar C., and Yves Rosseel. 2018. “Blavaan: Bayesian Structural Equation Models via Parameter Expansion.” Journal Article. Journal of Statistical Software 85 (4): 1–30. https://doi.org/https://doi.org/10.18637/jss.v085.i04.
Poggiato, Giovanni, Tamara Munkemuller, Daria Bystrova, Julyan Arbel, James S. Clark, and Wilfried Thuiller. 2021. “On the Interpretations of Joint Modeling in Community Ecology.” Journal Article. Trends in Ecology & Evolution 36 (5): 391–401. https://doi.org/https://doi.org/10.1016/j.tree.2021.01.002.
R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Team, Stan Development. 2018a. “RStan: The r Interface to Stan.” Journal Article. http://mc-stan.org.
———. 2018b. “Stan Modeling Language Users Guide and Reference Manual, Version 2.18.0.” Journal Article. http://mc-stan.org.
———. 2018c. “The Stan Core Library, Version 2.18.0.” Journal Article. http://mc-stan.org.
Warton, David I., Guillame F. Blanchet, Robert B. O’Hara, Otso Ovaskainen, Sara Taskinen, Steven C. Walker, and Francis K. C. Hui. 2015. “So Many Variables: Joint Modeling in Community Ecology.” Journal Article. Trends in Ecology & Evolution 30 (12): 766–79. https://doi.org/https://doi.org/10.1016/j.tree.2015.09.007.
Webb, Jackie R., Nicole M. Hayes, Gavin L. Simpson, Peter R. Leavitt, Helen M. Baulch, and Kerri Finlay. 2019. “Widespread Nitrous Oxide Undersaturation in Farm Waterbodies Creates an Unexpected Greenhouse Gas Sink.” Journal Article. Proceedings of the National Academy of Sciences 116 (20): 9814–19. https://doi.org/10.1073/pnas.1820389116.
Zachmann, Luke J., Erin M. Borgman, Dana L. Witwicki, Megan C. Swan, Cheryl McIntyre, and N. Thompson Hobbs. 2022. “Bayesian Models for Analysis of Inventory and Monitoring Data with Non-Ignorable Missingness.” Journal Article. Journal of Agricultural, Biological and Environmental Statistics 27: 125–48. https://doi.org/https://doi.org/10.1007/s13253-021-00473-z.
---
title: "Bayesian models for NLA 2017 $N_2O$ survey data"
author: "Roy Martin, Jake Beaulieu, Michael McManus"
date: "`r Sys.Date()`"
output:
  pdf_document:
    toc: yes
    toc_depth: '4'
  html_notebook:
    toc: yes
    toc_depth: 4
    toc_float: yes
    code_folding: show
    font_size: 14
    number_sections: yes
    theme: simplex
bibliography: RWM_Endnote_Library.bib
link-citations: yes
editor_options:
  chunk_output_type: inline
---

```{r begin, eval=TRUE, include=FALSE}
library(ggpubr)
library(moments)
library(ggplot2)
library(ggExtra)
library(gridExtra)
library(kableExtra)
library(ggrepel)
library(dplyr)
library(tidyverse)
library(tidyr)
library(future)
library(foreach)
library(itertools)
library(bayesplot)
library(tidybayes)
library(brms)

options(mc.cores = parallel::detectCores(logical = FALSE))
options( max.print = 1000 )

# Identify local path for each user
localPath <- Sys.getenv("USERPROFILE")

# Define helper functions
# standardized formatting for column names
toEPA <- function(X1){
  names(X1) = tolower(names(X1))
  names(X1) = gsub(pattern = c("\\(| |#|)|/|-|\\+|:|_"), replacement = ".", x = names(X1))
  X1
}

# stat: skew 
skew <- function(x) {
  xdev <- x - mean(x)
  n <- length(x)
  r <- sum(xdev^3) / sum(xdev^2)^1.5
  return(r * sqrt(n) * (1 - 1/n)^1.5)
}

# function for DHARMa residual analysis
check_brms <- function(model,             # brms model
                       integer = FALSE,   # integer response? (TRUE/FALSE)
                       plot = TRUE,       # make plot?
                       resp = NULL,
                       ...                # further arguments for DHARMa::plotResiduals 
) {
  
  mdata <- brms::standata(model)
  if(!"Y" %in% names(mdata))
    oResp <- mdata[[paste0("Y_", resp)]]
  else
    oResp <- mdata[["Y"]]
  #  stop("Cannot extract the required information from this brms model")
  
  dharma.obj <- DHARMa::createDHARMa(
    simulatedResponse = t(brms::posterior_predict(model, resp = resp, ndraws = 1000)),
    observedResponse = oResp, 
    fittedPredictedResponse = apply(
      t(brms::posterior_epred(model, resp = resp, ndraws = 1000, re.form = NA)),
      1,
      median),
    integerResponse = integer)
  
  if (isTRUE(plot)) {
    plot(dharma.obj, ...)
  }
  
  invisible(dharma.obj)
  
}
```

# Rationale and Objectives
This document details the modeling workflow implemented for estimating dissolved and equilibrium N2O concentrations and saturation ratios using the 2017 Nation Lakes Assessment (NLA) survey data. The NLA sampling sites were distributed among the target population of US lakes (in the lower 48 states) according to a probabilistic survey design with samples stratified among categories of lake surface area, WSA9 ecoregion, and US state (excluding AK and HI). Due to the stratification scheme, some types of lakes in the sample population were intentionally over-represented (e.g., large lakes) and some were under-represented (e.g., small lakes) relative to the target population. Due to the unequal probability design, inferences from the sample had to be adjusted for inferences on the broader populations of interest (e.g, National-, state-, ecoregion-, and size class-specific estimates). 

The concept of the "complete data likelihood" is useful for conceptualizing biases arising from sampling design [@Zachmann_etal_2022; @Gelman_etal_2014 Ch. 8; @Link_Barker_2010]. For the NLA survey data, the population of US lakes in the lower 48 states larger than 4 hectares was considered the complete data and the probabilistic samples were considered a subset of that complete data. The portion of US lakes not included in the sample were considered "missing" from the complete data _not_ at random, but conditional on the pre-specified design (stratification) variables. This non-random missingness was not ignorable for the purpose of making inferences from the sample to the target population. In a model-based framework, however, including the design parameters as predictors in a regression model is one way to adjust for the missingness. For a thorough and recent treatment of this concept in the context of national surveys of environmental resources, refer to [@Zachmann_etal_2022]. This concept is a key motivator for the increasingly popular mulitilevel regression with poststratification (MRP) approach to model-based inference [@Gelman_etal_2014; Gelmant_etal_2020 Ch. 17].

The following workflow illustrates our model-based approach, based largely on the logic of MRP, but with an elaboration on the poststratification step to enable eventual estimates of total gas flux at the population level, which required scaling up from lake-level estimates. The typical MRP process is carried out in two steps. The first step is to fit regression models for the response variables of interest (e.g., dissolved N2O, equilibrium N2O) conditional on the survey design variables ┼(i.e., ecoregion, state, lake size). The second step is post-stratfication, wherein the posterior parameter estimates from the regression model for the sample population are weighted based on their known or assumed distribution in the population of interest [i.e., post-stratification table; @Gelman_etal_2020 Ch. 17]. The poststratification table in our case, for example, would be a population summary of lakes among the design variables: ecoregion, state, and size category. However, because we eventually needed lake-level estmates, instead of predicting to a postratification table, we predicted to each individual lake in the population of interest. This meant predicting to the full target population of 465,897 natural and man made US lakes larger than 4 hectares in the lower 48 states. These predictions were assumed relevant to average conditions during the "index period" for each lake in 2017. Details about the sampling frame as well as the target population are further clarified in the workbook below with data summaries and code. 

For the regressions, we used multilevel models fit in a fully Bayesian fashion Multilevel models are thought to work well in this context because they provide regularized estimates along the design groupings, which can improve out-of-sample inferences [@McElreath_2020]. Inferences for lake types that may be missing from the sample, but are part of the population of interest are also straightforward using this approach [@Gelman_etal_2020 Ch. 17; @McElreath_2020]. More information these models, their specific parameters, R code, fit evaluations, and resulting inferences are presented in this document.

The overriding objective of the modeling effort was to provide population level estimates for (1) dissolved and equilibrium N2O concentrations; (2) the N2O saturation ratio (i.e., dissolved N2O/equilibrium N2O); and (3) the proportion of under-saturated water bodies (i.e., saturation ratio < 1). The estimates would also be used to later estimate the total flux of N2O gas attributable to the target population of lakes over the index period. The saturation ratio estimates were calculated as a derived quantity based on the ratio of modeled dissolved to equilibrium N2O. Because dissolved and equilibrium N2O were observed on the same sample units (lake sites), we developed models for estimating their joint distribution. The response variable in the models was, therefore, multivariate to account for potential statistical dependencies between dissolved and equilibrium N2O due to, for example, common dependencies on geography. Although point predictions of the mean marginal probabilities from separate models could be comparable, a joint model allowing correlated observation-level errors (i.e., residuals) was expected to better capture uncertainty and potentially improve out-of-sample predictions, should the variables be conditionally correlated [@Warton_etal_2015; @Poggiato_etal_2021]. All of the models fit were constructed using the `brms` package [@Burkner_2017] in `R` [@R_Core_Team_2021] as an interface to Stan, a software package for fitting fully Bayesian models via Hamiltonian Monte Carlo [HMC; @Stan_Development_Team_2018_a; @Stan_Development_Team_2018_b; @Stan_Development_Team_2018_c].

# Data
As explained in a previous data munging document document
(https://github.com/USEPA/DissolvedGasNla/blob/master/scripts/dgIndicatorAnalysis.html), duplicate dissolved gas samples were collected at a depth of ~0.1m at designated index sites distributed across 1091 lakes nationwide, of which 95 were sampled twice as repeat visits. This randomly selected subset of revisit sites was used as a test set for assessing model fit and out-of-sample performance. 

Gas samples were analyzed via gas chromotography and concentrations were recorded to the nearest 0.001 nmol/L. The samples were collected under a stratified, unequal probability design and each gas observation was indexed to an individual lake selected with unequal probability from 5 different lake size categories, $j \in j=1,...,J = 5$, according to surface area (ha), and from within a state, $k \in k=1,...,K = 48$, situated within an aggregated, WSA9 or Omernik ecoregion, $l \in l=1,...,L = 9$. All 9 WSA9 ecoregions were represented in the sample, including Xeric (XER), Western Mountain (WMT),  Northern Plains (NPL), Southern Plains (SPL), Temperate Plains (TPL), Coastal Plains (CPL), Upper Midwest (UMW), Northern Appalachian (NAP), and Southern Appalachian (SAP) regions. As shown below, the data from the initial and revisit samples were separately compiled into data frame objects in $\textbf{R}$, with $n=984$ and $n=95$ rows, respectively, of gas observations indexed to the survey design variables and several potentially relevant covariates. 

## Import
The gas data and covariates were previously described and munged at 
https://github.com/USEPA/DissolvedGasNla/blob/master/scripts/dataMunge.html. That dataset was imported below. 
```{r import_data, eval=FALSE, include=TRUE}
load( file = paste0( localPath,
              "/Environmental Protection Agency (EPA)/",
              "ORD NLA17 Dissolved Gas - Documents/",
              "inputData/dg.2021-02-01.RData")
      )

save(dg, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/dg.rda") 
```

From the imported dataset, a new data frame for modeling was constructed from the original file including only the variables of interest: (1) the N2O gas observations; (2) the survey design variables indexed to those observations; and (3) additional covariates considered potentially useful for improving the fit of the model. The data frame below excluded the second-visit observations, which would later be used for model checking. Some variables from the imported data were renamed for convenience. In addition, the NO3 covariate was rounded according to the documented measurement precision. An alternative version of the NO3 covariate was also created in this step by log-transforming and re-coding it as an ordered factor with five levels at hand-drawn cut points. The left-most cut point separated observations below the detection limit from the completely observed samples. The remaining cut points in the positive direction were drawn at approximately equal distances along the log scale. Finally, it should be noted that one lake that was sampled was missing information on the N2O gas measurements and it was removed from the data frame.
```{r model_data, echo=TRUE, paged.print=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/dg.rda")

dg %>%
  filter(!is.na(dissolved.n2o.nmol)) %>% # 1 obs with missing measurement
  nrow() # number of observations before filtering

df_model <- dg %>%
  filter(!is.na(dissolved.n2o.nmol)) %>%
  filter(sitetype == "PROB") %>% # probability samples only
  filter(visit.no == 1) %>%
  mutate(n2o = round(dissolved.n2o.nmol, 2),
         n2o_eq = round(sat.n2o.nmol, 2),
         n2o_sat = n2o.sat.ratio,
         n2o_em = e.n2o.nmol.d,
         n2o_flux = f.n2o.m.d,
         WSA9 = factor(ag.eco9),
         state = factor(state.abb[match(state.nm, state.name)]),
         area_ha = area.ha,
         log_area = log(area_ha),
         chla = chla.result,
         log_chla = log(chla),
         elev = elevation,
         log_elev = log(elev + 1),
         do_surf = o2.surf,
         log_do = log(do_surf),
         bf_max = max.bf,
         sqrt_bf = sqrt(bf_max),
         size_cat = recode(area.cat6, 
                           "(1,4]" = "min_4" ,
                           "(10,20]" = "10_20",
                           "(20,50]" = "20_50",
                           "(4,10]" = "4_10",
                           ">50" = "50_max")) %>%
  mutate(size_cat = factor(size_cat,
                           levels = c("min_4", "4_10", "10_20", "20_50", "50_max"),
                           ordered = TRUE)) %>%
  mutate(no3 = ifelse(nitrate.n.result <= 0.0005, 0.0005, round(nitrate.n.result, 4))) %>%# 1/2 mdl 0.01
  mutate(no3_cat = cut(log(no3), # convert no3 to ordered factor with 5 levels
                       breaks = c(-Inf, -7.5, -5.5, -3.5, -1.5, Inf),
                       labels =seq(1, 5, 1))) %>%
  mutate(no3_cat = factor(no3_cat,
                          levels = seq(1, 5, 1),
                          ordered = TRUE)) %>%
  mutate(date = as.Date(date.col)) %>%
  mutate(jdate = as.numeric(format(date, "%j"))) %>% 
  mutate(lat = map.lat.dd,
         lon = map.lon.dd) %>% # longitude
  mutate(surftemp = surftemp,
         log_surftemp = log(surftemp)) %>% 
  select(WSA9,
         state,
         size_cat,
         site.id,
         lat,
         lon,
         date,
         jdate,
         surftemp,
         log_surftemp,
         area_ha,
         log_area,
         elev,
         log_elev,
         chla,
         log_chla,
         do_surf,
         log_do,
         bf_max,
         sqrt_bf,
         n2o,
         n2o_eq,
         no3,
         no3_cat
         )

save(df_model, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda") 

nrow(df_model) # number of obs after filtering

print(df_model)
```

A second dataframe, including only the second visit observations, was constructed below. These data were later used as a "test set" to assess the out-of-sample fit of the model developed on the first-visit or training data.
```{r test_data, echo=TRUE, paged.print=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/dg.rda")

# number of observations before filtering probability samples
dg %>%
  filter(!is.na(dissolved.n2o.nmol)) %>% # remove obs with missing response measurements
  nrow()

df_test <- dg %>%
  filter(!is.na(dissolved.n2o.nmol)) %>%
  filter(sitetype == "PROB") %>% # probability samples only
  filter(visit.no == 2) %>%
  mutate(n2o = round(dissolved.n2o.nmol, 2),
         n2o_eq = round(sat.n2o.nmol, 2),
         n2o_sat = n2o.sat.ratio,
         n2o_em = e.n2o.nmol.d,
         n2o_flux = f.n2o.m.d,
         WSA9 = factor(ag.eco9),
         state = factor(state.abb[match(state.nm, state.name)]),
         area_ha = area.ha,
         log_area = log(area_ha),
         chla = chla.result,
         log_chla = log(chla),
         elev = elevation,
         log_elev = log(elev + 1),
         do_surf = o2.surf,
         log_do = log(do_surf),
         bf_max = max.bf,
         sqrt_bf = sqrt(bf_max),
         size_cat = recode(area.cat6, 
                           "(1,4]" = "min_4" ,
                           "(10,20]" = "10_20",
                           "(20,50]" = "20_50",
                           "(4,10]" = "4_10",
                           ">50" = "50_max")) %>%
  mutate(size_cat = factor(size_cat,
                           levels = c("min_4", "4_10", "10_20", "20_50", "50_max"),
                           ordered = TRUE)) %>%
  mutate(no3 = ifelse(nitrate.n.result <= 0.0005, 0.0005, round(nitrate.n.result, 4))) %>%# 1/2 mdl 0.01
  mutate(no3_cat = cut(log(no3), # convert no3 to ordered factor with 5 levels
                       breaks = c(-Inf, -7.5, -5.5, -3.5, -1.5, Inf),
                       labels =seq(1, 5, 1))) %>%
  mutate(no3_cat = factor(no3_cat,
                          levels = seq(1, 5, 1),
                          ordered = TRUE)) %>%
  mutate(date = as.Date(date.col)) %>%
  mutate(jdate = as.numeric(format(date, "%j"))) %>% 
  mutate(lat = map.lat.dd,
         lon = map.lon.dd) %>% # longitude
  mutate(surftemp = surftemp,
         log_surftemp = log(surftemp)) %>% 
  select(WSA9,
         state,
         size_cat,
         site.id,
         lat,
         lon,
         date,
         jdate,
         surftemp,
         log_surftemp,
         area_ha,
         log_area,
         elev,
         log_elev,
         chla,
         log_chla,
         do_surf,
         log_do,
         bf_max,
         sqrt_bf,
         n2o,
         n2o_eq,
         no3,
         no3_cat
         )

save(df_test, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_test.rda") 

nrow(df_test) # number of obs after filtering for probability samples, first visits, and removing one site missing ecoregion (WSA9) info.

print(df_test)
```

## Target population
Below. the NLA sampling frame was imported and then filtered to include only the target population or sampling frame for this project.
```{r import_sample_frame, echo=TRUE}
df_pop <- read.csv(file = paste0(localPath,
              "/Environmental Protection Agency (EPA)/",
              "ORD NLA17 Dissolved Gas - Documents/",
              "inputData/NLA_Sample_Frame.csv"), header = T)

sframe <- df_pop %>%
  filter(nla17_sf != "Exclude2017") %>%
  filter(nla17_sf != "Exclude2017_Include2017NH") %>%
  filter(state != "DC") %>%
  filter(state != "HI") %>%
  droplevels() %>%
  mutate(WSA9 = factor(ag_eco9),
         WSA9 = forcats::fct_drop(WSA9), # remove NA level
         state = factor(state),
         size_cat = factor(area_cat6),
         lat = lat_dd83,
         lon = lon_dd83,
         log_area = log(area_ha),
         elev = elevation,
         log_elev = ifelse(elev <= 0, 0, elev), # assumed elev < 0 to be elev = 0
         log_elev = log(log_elev + 1)
         ) %>% 
  mutate(size_cat = recode(size_cat, 
                           "(1,4]" = "min_4" ,
                           "(10,20]" = "10_20",
                           "(20,50]" = "20_50",
                           "(4,10]" = "4_10",
                           ">50" = "50_max")) %>%
  mutate(size_cat = factor(size_cat, 
                           levels = c("min_4", "4_10", "10_20", "20_50", "50_max"),
                           ordered = TRUE)) %>%
  select(WSA9, state, size_cat, lat, lon, area_ha, log_area, elev, log_elev)

rm(df_pop)

save(sframe, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda") 

print(sframe)
```

The resulting target population above included a total of 465,897 waterbodies.

Cross tabulations below describe the structure of the target population with respect to the design variables. The cross-tabulation makes it clear that each ecoregion does not contain each state. Therefore, in the statistical sense, states were nested in ecoregions.
```{r frame_dimensions_1, echo=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda")

sframe %>%
  group_by(WSA9, state) %>%
  summarise(n = n(), .groups = "drop") %>%
  spread(state, n) %>%
  print()
```

Likewise, lake size category was nested in state (which was nested in ecoregion). That is, not every ecoregion:state in the population of interest contained every size category (below).
```{r frame_dimensions_4, echo=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda")

sframe %>%
  group_by(WSA9, state, size_cat) %>%
  summarise(n = n(), .groups = "drop") %>%
  spread(size_cat, n) %>%
  print()
```

Below, the sampling frame was selected down to create a post-stratification table. Some of the variables were renamed to match the naming conventions used in the observational data above. There were 536 types of lakes in the population of interest with respect to the sampling design. The counts of those lake types (n_lakes) and their proportions relative to the total population of lakes in the sampling frame (prop_cell) are indicated below.
```{r filter_frame, echo=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda")

pframe <- sframe %>%
  mutate(obs = 1) %>%
  group_by(WSA9, state, size_cat) %>%
  summarise(n_lakes = sum(obs), .groups = "drop") %>%
  ungroup() %>%
  mutate(prop_cell = n_lakes/sum(n_lakes)) %>%
  mutate(type = "population") 

save(pframe, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe.rda")

print(pframe)
```

## Sample vs. population
Below, the lake distributions in the population of interest were compared to the proportions in the observed sample. There were 352 lake types in the sample compared to the 536 in the population of of interest. In total, there were 984 observations distributed across these 352 lake types in the sample; and the number of samples was not distributed evenly across the types. Some cells were represented by as few as 1 lake. In total, 536-352 = 184 lake types in the population of interest were not represented in the sample.
```{r sample_cell_counts, echo=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

samp_props <- df_model %>%
  mutate(obs = 1) %>%
  group_by(WSA9, state, size_cat) %>%
  summarize(n_lakes = sum(obs), .groups = "drop") %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes / sum(n_lakes), 7)) %>%
  mutate(type = "sample") 

save(samp_props, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")

print(samp_props)
```

Below, a graphical comparison was constructed to depict the distribution of cells in the population of interest _versus_ those in the sample.
```{r compare_sample_pop_cells, echo=FALSE, fig.align='center', fig.height=4, fig.width=10}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")

pframe %>%
  bind_rows(samp_props) %>%
  ggplot(aes(x = interaction(WSA9, state, size_cat), y = prop_cell, group = type, linetype = type)) +
  geom_point(stat = "identity", aes( shape = type, color = type)) +
  geom_line() +
  theme_tidybayes() +
  theme(axis.text.x = element_blank()) +
  xlab("WSA9:state:size") +
  ylab("proportion in cell")
```

Another comparison between population and sample was constructed below by ecoregion. The samples were not balanced across ecoregions. Lakes in the Coastal Plains (CPL) ecoregion, for example, were clearly undersampled relative to their proportion of the population.
```{r eco_props_pop, eval=FALSE, include=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe.rda")

pframe_eco <- pframe %>%
  group_by(WSA9) %>%
  summarise(n_lakes = sum(n_lakes)) %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
  ungroup() %>%
  mutate(type = 'population') 

save(pframe_eco, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe_eco.rda")
```

```{r eco_props_sample, eval=FALSE, include=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")

samp_props_eco <- samp_props %>%
  group_by(WSA9) %>%
  summarise(n_lakes = sum(n_lakes)) %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
  ungroup() %>%
  mutate(type = 'sample')

save(samp_props_eco, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props_eco.rda")
```

```{r compare_eco_sample_pop_cells, echo=FALSE, fig.align='center', fig.height=4, fig.width=6}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe_eco.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props_eco.rda")

pframe_eco %>%
  bind_rows(samp_props_eco) %>%
  ggplot(mapping = aes(x = WSA9, y = prop_cell, group = type, linetype = type)) +
  geom_point(stat = "identity", aes( shape = type, color = type), size = 3) +
  geom_line() +
  theme_tidybayes() +
  xlab("Ecoregion") +
  ylab("proportion in cell") + 
  theme(legend.position = "top",
        legend.title = element_blank(),
        legend.text = element_text(size = 14)) +
  theme(text = element_text(size = 12))
```

A similar comparison by state was constructed below.
```{r state_props_pop, eval=FALSE, include=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe.rda")

pframe_state <- pframe %>%
  group_by(state) %>%
  summarise(n_lakes = sum(n_lakes)) %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
  ungroup() %>%
  mutate(type = 'population')

save(pframe_state, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe_state.rda")
```

```{r state_props_sample, eval=FALSE, include=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")

samp_props_state <- samp_props %>%
  group_by(state) %>%
  summarise(n_lakes = sum(n_lakes)) %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
  ungroup() %>%
  mutate(type = 'sample')

save(samp_props_state, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props_state.rda")
```

```{r compare_state_sample_pop_cells, echo=FALSE, fig.align='center', fig.height=4, fig.width=10}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe_state.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props_state.rda")

pframe_state %>%
  bind_rows(samp_props_state) %>%
  ggplot(mapping = aes(x = state, y = prop_cell, group = type, linetype = type)) +
  geom_point(stat = "identity", aes(shape = type, color = type)) +
  geom_line() +
  theme_tidybayes() +
  theme(axis.text.x = element_text(angle = 45)) +
  xlab("State") +
  ylab("proportion in cell")
```

Finally, a comparison by lake size category is shown below. Note that small lakes were under-sampled relative to larger lakes by design.
```{r size_props_pop, eval=FALSE, include=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe.rda")

pframe_size <- pframe %>%
  group_by(size_cat) %>%
  summarise(n_lakes = sum(n_lakes)) %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
  ungroup() %>%
  mutate(type = 'population')

save(pframe_size, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe_size.rda")
```

```{r size_props_sample, eval=FALSE, include=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")

samp_props_size <- samp_props %>%
  group_by(size_cat) %>%
  summarise(n_lakes = sum(n_lakes)) %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
  ungroup() %>%
  mutate(type = 'sample')

save(samp_props_size, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props_size.rda")
```

```{r compare_size_sample_pop_cells, echo=FALSE, fig.align='center', fig.height=4, fig.width=6}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe_size.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props_size.rda")

pframe_size %>%
  bind_rows(samp_props_size) %>%
  ggplot(mapping = aes(x = size_cat, y = prop_cell, group = type, linetype = type)) +
  geom_point(stat = "identity", aes( shape = type, color = type)) +
  geom_line() +
  theme_tidybayes() +
  xlab("Size category") +
  ylab("proportion in cell")
```

## Sample-based estimates
The overall mean and standard deviation for N2O in the sample:
```{r sample_summary_n2o}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  summarise(mean = mean(n2o),
             sd = sd(n2o)) %>%
  print()
```

The same summary for equilibrium N2O:
```{r sample_summary_n2oeq}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  summarise(mean = mean(n2o_eq),
             sd = sd(n2o_eq)) %>%
  print()
```

The saturation ratio (i.e., N2O / N2O-eq):
```{r sample_summary_n2osat}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  summarise(mean = mean(n2o / n2o_eq),
             sd = sd(n2o / n2o_eq)) %>%
  print()
```

Finally, roughly 67% of lakes in the sample were undersaturated (i.e., saturation ratio < 1):
```{r sample_summary_propsat}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  summarise(prop_undersat = sum((n2o / n2o_eq) < 1) / 984) %>%
  print()
```

Using only the sample observations, a plot was constructed below of the overall mean (dashed line) along with the ecoregion-specific means (black circles). The shaded areas indicate +/- 1 standard deviation. Neither dissolved N2O nor the saturation ratio were clearly structured by ecoregion in the sample, but there did appear to be some structure in the equilibrium N2O observations.
```{r sample_summary_eco, echo=FALSE, fig.align='center', fig.height=6, fig.width=8, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

p1 <- df_model %>%
  group_by(WSA9) %>%
  summarise(mean = mean(n2o),
             sd = sd( n2o)) %>%
  ggplot(aes(x = WSA9, y = mean, group = 1)) +
  geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, x = WSA9), fill = 'lightgrey', alpha = .7)+
  geom_line(aes(x = WSA9, y = mean))+
  geom_point()+
  geom_hline(yintercept = 8.72, linetype = 'dashed') +
  theme_tidybayes() + 
  theme(axis.title.x = element_blank(),
        axis.text.x = element_blank()) + 
  ylab("") +
  ggtitle("N2O")

p2 <- df_model %>%
  group_by(WSA9) %>%
  summarise(mean = mean(n2o_eq),
             sd = sd(n2o_eq)) %>%
  ggplot(aes(x = WSA9, y = mean, group = 1)) +
  geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, x = WSA9), fill = 'lightgrey', alpha = .7)+
  geom_line(aes(x = WSA9, y = mean))+
  geom_point()+
  geom_hline(yintercept = 7.48, linetype = 'dashed') +
  theme_tidybayes() + 
  theme(axis.title.x = element_blank(),
        axis.text.x = element_blank()) +
  ylab("Sample mean") +
  ggtitle("N2O equilibrium")

p3 <- df_model %>%
  group_by(WSA9) %>%
  summarise(mean = mean(n2o / n2o_eq),
             sd = sd(n2o / n2o_eq)) %>%
  ggplot(aes(x = WSA9, y = mean, group = 1)) +
  geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, x = WSA9), fill = 'lightgrey', alpha = .7)+
  geom_line(aes(x = WSA9, y = mean))+
  geom_point()+
  geom_hline(yintercept = 1.17, linetype = 'dashed') +
  theme_tidybayes() +  
  xlab("Ecoregion") +
  ylab("") +
  ggtitle("N2O saturation ratio")

grid.arrange(p1, p2, p3)
```

The same summary by state is below.
```{r sample_summary_state, echo=FALSE, fig.align='center', fig.height=6, fig.width=8, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

p1 <- df_model %>%
  group_by(state) %>%
  summarise(mean = mean(n2o),
             sd = sd( n2o)) %>%
  ggplot(aes(x = state, y = mean, group = 1)) +
  geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, x = state), 
              fill = 'lightgrey', 
              alpha = .7) +
  geom_line(aes(x = state, y = mean))+
  geom_point()+
  geom_hline(yintercept = 8.72, linetype = 'dashed') + 
  theme_tidybayes() + 
  theme(axis.title.x = element_blank(),
        axis.text.x = element_blank()) + 
  ylab("") +
  ggtitle("N2O")

p2 <- df_model %>%
  group_by(state) %>%
  summarise(mean = mean(n2o_eq),
             sd = sd(n2o_eq)) %>%
  ggplot(aes(x = state, y = mean, group = 1)) +
  geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, x = state), 
              fill = 'lightgrey', 
              alpha = .7) +
  geom_line(aes(x = state, y = mean))+
  geom_point()+
  geom_hline(yintercept = 7.48, linetype = 'dashed') +
  theme_tidybayes() + 
  theme(axis.title.x = element_blank(),
        axis.text.x = element_blank()) +
  ylab("Sample mean") +
  ggtitle("N2O equilibrium")

p3 <- df_model %>%
  group_by(state) %>%
  summarise(mean = mean(n2o / n2o_eq),
             sd = sd(n2o / n2o_eq)) %>%
  ggplot(aes(x = state, y = mean, group = 1)) +
  geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, x = state), 
              fill = 'lightgrey', 
              alpha = .7)+
  geom_line(aes(x = state, y = mean))+
  geom_point()+
  geom_hline(yintercept = 1.17, linetype = 'dashed') +
  theme_tidybayes() + 
  theme(axis.text.x = element_text(angle = 45)) + 
  xlab("State") +
  ylab("") +
  ggtitle("N2O saturation ratio")

grid.arrange(p1, p2, p3)
```

Finally, the same summary by size category:
```{r sample_summary_size, echo=FALSE, fig.align='center', fig.height=6, fig.width=8, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

p1 <- df_model %>%
  group_by(size_cat) %>%
  summarise(mean = mean(n2o),
             sd = sd( n2o)) %>%
  ggplot(aes(x = size_cat, y = mean, group = 1)) +
  geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, x = size_cat), fill = 'lightgrey', alpha = .7)+
  geom_line(aes(x = size_cat, y = mean))+
  geom_point()+
  geom_hline(yintercept = 8.72, linetype = 'dashed') +
  theme_tidybayes() + 
  theme(axis.title.x = element_blank(),
        axis.text.x = element_blank()) + 
  ylab("") +
  ggtitle("N2O")

p2 <- df_model %>%
  group_by(size_cat) %>%
  summarise(mean = mean(n2o_eq),
             sd = sd(n2o_eq)) %>%
  ggplot(aes(x = size_cat, y = mean, group = 1)) +
  geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, x = size_cat), fill = 'lightgrey', alpha = .7)+
  geom_line(aes(x = size_cat, y = mean))+
  geom_point()+
  geom_hline(yintercept = 7.48, linetype = 'dashed') +
  theme_tidybayes() + 
  theme(axis.title.x = element_blank(),
        axis.text.x = element_blank()) +
  ylab("Sample mean") +
  ggtitle("N2O equilibrium")

p3 <- df_model %>%
  group_by(size_cat) %>%
  summarise(mean = mean(n2o / n2o_eq),
             sd = sd(n2o / n2o_eq)) %>%
  ggplot(aes(x = size_cat, y = mean, group = 1)) +
  geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, x = size_cat), fill = 'lightgrey', alpha = .7)+
  geom_line(aes(x = size_cat, y = mean))+
  geom_point()+
  geom_hline(yintercept = 1.17, linetype = 'dashed') +
  theme_tidybayes() + 
  theme(axis.text.x = element_text(angle = 45)) + 
  xlab("Size class") +
  ylab("") +
  ggtitle("N2O saturation ratio")

grid.arrange(p1, p2, p3)
```

## Sample data exploration
Below, the empirical distribution of N2O observations in the sample was summarized using a density and rug plot below. Note the natural log scale of the x axis. Both the N2O and equilibrium N2O data had considerable right skew even after the log transformation, which was not unexpected and has been noted in other studies [@Webb_etal_2019]. The saturation ratio was also skewed since it was derived from the other two observed variables (i.e., sat_ratio = n2o / n2o_eq).
```{r summary_N2O, echo=FALSE, fig.align='center', fig.height=8, fig.width=5, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

p1 <- df_model %>%
  ggplot(aes(x = n2o)) +
  geom_boxplot(aes(x = n2o, y = -0.5), outlier.shape = NA, alpha = 0.7) + 
  geom_density(aes(x = n2o)) +
  geom_rug(aes(x = n2o), show.legend = F ) +
  theme(text = element_text(size=12)) +
  scale_x_continuous(trans = "log", breaks = c(0, 5, 10, 25, 50, 150)) +
  ylab("density") +
  xlab("N2O (nmol/L)") +
  theme_tidybayes() +
  theme(axis.text.y = element_blank())

p2 <- df_model %>%
  ggplot(aes(x = n2o_eq)) +
  geom_boxplot(aes(x = n2o_eq, y = -0.5), outlier.shape = NA, alpha = 0.7) + 
  geom_density(aes(x = n2o_eq)) +
  geom_rug(aes(x = n2o_eq), show.legend = F ) +
  theme(text = element_text(size=12)) +
  scale_x_continuous(trans = "log") +
  ylab("density") +
  xlab("Equilibrium N2O (nmol/L)") +
  theme_tidybayes() +
  theme(axis.text.y = element_blank())

p3 <- df_model %>%
  ggplot(aes(x = n2o / n2o_eq)) +
  geom_boxplot(aes(x = n2o / n2o_eq, y = -0.5), outlier.shape = NA, alpha = 0.7) + 
  geom_density(aes(x = n2o / n2o_eq)) +
  geom_rug(aes(x = n2o / n2o_eq), show.legend = F ) +
  theme(text = element_text(size=12)) +
  scale_x_continuous(trans = "log", breaks = c(0, 1, 5, 10, 20)) +
  ylab("density") +
  xlab("N2O saturation ratio") +
  theme_tidybayes() +
  theme(axis.text.y = element_blank())

grid.arrange(p1, p2, p3)
```

Below are plots of N2O vs. NO3. The first plot shows log(N2O) vs. log(NO3), as well as the ordinal categories assigned to NO3 (vertical lines). The leftmost vertical line is dashed and separates the NO3 observations below the detection limit.
```{r summary_N2O_vs_NO3, echo=FALSE, fig.align='center', fig.height=4, fig.width=6, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  ggplot(aes(x = log(no3), y = log(n2o))) +
  geom_point(show.legend = F) +
  geom_smooth(method = "loess", span = 2) +
  geom_vline(xintercept = -7.5, linetype = "dashed") +
  geom_vline(xintercept = -5.5) +
  geom_vline(xintercept = -3.5) +
  geom_vline(xintercept = -1.5) +
  theme(text = element_text(size=12)) +
  theme_bw()
```

In the plot above, the trend is increasing and nonlinear on the log scale. The increasing variance in N2O along the NO3 gradeient suggested a potential mediator of the relationship between NO3 on N2O. Below are plots of N2O vs. NO3 for 6 quantiles of the surface temperature measurements (quantiles increasing from 1 to 6). This plot below suggested that the NO3 effect on N2O may have been stronger in lakes with higher observed temperatures.
```{r summary_N2O_vs_NO3_surftemp, echo=FALSE, fig.align='center', fig.height=5, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  ggplot(aes(x = log(no3), y = log(n2o))) +
  geom_point(show.legend = F) +
  geom_smooth(method = "loess", span = 2) +
  theme(text = element_text(size=12)) +
  facet_wrap(~ as.factor(ntile(surftemp, 6))) +
  theme_bw()
```

The next plot below shows the relationship between N2O and NO3 at 6 different quantiles (increasing 1 to 6) of the log-scaled lake surface area estimates.
```{r summary_N2O_vs_NO3_logarea, echo=FALSE, fig.align='center', fig.height=5, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  ggplot(aes(x = log(no3), y = log(n2o))) +
  geom_point(show.legend = F) +
  geom_smooth(method = "loess", span = 2) +
  theme(text = element_text(size=12)) +
  facet_wrap(~ as.factor(ntile(log_area, 6))) +
  theme_bw()
```

Similar plots are below, but with NO3 expressed as an ordered categorical variable with 5 levels. The positive and monotonic trends area similar to the previous plots where NO3 was treated as continuous. Note the large number of observations in the first NO3 category (no3_cat = 1). This category represented all of the censored observations for NO3, which was most of the data.
```{r summary_N2O_vs_NO3cat, echo=FALSE, fig.align='center', fig.height=4, fig.width=6, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  ggplot(aes(x = no3_cat, y = log(n2o), color = 1)) +
  geom_point( position = position_jitterdodge(), show.legend = F ) +
  geom_boxplot(outlier.shape = NA, notch = TRUE, color = "black", alpha = 0.7) +
  theme(text = element_text(size=12)) +
  theme_bw()
```

```{r summary_N2O_vs_NO3cat_surftemp, echo=FALSE, fig.align='center', fig.height=5, fig.width=8, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  ggplot(aes(x = no3_cat, y = log(n2o), color = 1)) +
  geom_point( position = position_jitterdodge(), show.legend = F ) +
  geom_boxplot(outlier.shape = NA, color = "black", alpha = 0.7) +
  theme(text = element_text(size=12)) +
  facet_wrap(~ as.factor(ntile(surftemp, 6))) +
  theme_bw()
```

```{r summary_N2O_vs_NO3cat_logarea, echo=FALSE, fig.align='center', fig.height=5, fig.width=8, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  ggplot(aes(x = no3_cat, y = log(n2o), color = 1)) +
  geom_point( position = position_jitterdodge(), show.legend = F ) +
  geom_boxplot(outlier.shape = NA, color = "black", alpha = 0.7) +
  theme(text = element_text(size=12)) +
  facet_wrap(~ as.factor(ntile(log_area, 6))) +
  theme_bw()
```

Below is a plot of log(N2O) vs. log(NO3) by ecoregion, which suggested that the NO3 effect on N2O may have varied by ecoregion.
```{r summary_N2O_vs_NO3_ecoregion, echo=FALSE, fig.align='center', fig.height=6, fig.width=8, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  ggplot(aes(x = log(no3), y = log(n2o))) +
  geom_point(show.legend = F) +
  geom_smooth(method = "loess", span = 2) +
  geom_vline(xintercept = -7.5, linetype = "dashed") +
  geom_vline(xintercept = -5.5) +
  geom_vline(xintercept = -3.5) +
  geom_vline(xintercept = -1.5) +
  theme(text = element_text(size=12)) +
  facet_wrap(~ WSA9) +
  theme_bw()
```

Below is the same plot as above but for the ordered categorical version of NO3.
```{r summary_N2O_vs_NO3cat_ecoregion, echo=FALSE, fig.align='center', fig.height=6, fig.width=8, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  ggplot(aes(x = no3_cat, y = log(n2o), color = 1)) +
  geom_point( position = position_jitterdodge(), show.legend = F ) +
  geom_boxplot(aes(x = no3_cat, y = log(n2o)), 
               outlier.shape = NA, 
               color = "black", 
               alpha = 0.7) + 
  facet_wrap(~ WSA9) +
  theme(text = element_text(size=12)) +
  theme_bw()
```

A plot below shows trends by state within just the Temperate Plains (TPL) ecoregion. Within states, the number of observations were relatively small, but the trends appeared closer to linear.
```{r summary_N2O_vs_NO3_wsa9state3, echo=FALSE, fig.align='center', fig.height=4, fig.width=8, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  filter(WSA9 == "TPL") %>%
  ggplot(aes(x = log(no3), y = log(n2o), group = state, color = state)) +
  geom_point(show.legend = F) +
  geom_smooth(method = "lm", span = 2, alpha = 0.1) +
  theme(text = element_text(size=12)) +
  theme_bw()
```

# Model fitting
The first regression model was constructed to estimate the joint distribution of log-transformed N2O and equilibrium N2O conditional on the the design factors. Each log-transformed observation, $i \in 1,..,N=984$, for each response, $p \in 1:P=2$, was assumed to be drawn from a multivariate normal distribution with the parameters $\nu$ and $\Sigma$, where $\nu$ is the multivariate mean estimated conditional on the design effects and $\Sigma$ is a covariance matrix containing the observation-level variances and residual correlation:
$$Y \sim MVN(\nu, \Sigma)$$

The multivariate mean is a vector of mean parameters, $\nu:[\mu_{p=1}, \mu_{p=2}]$, for each response. Each mean is further defined by a linear combination of parameters where, for each response $p$ and observation $i$:

$$\mu_{pi} = \alpha_{0(pi)} + \alpha_{1(pij)} + \alpha_{2(pijk)} + \alpha_{3(pijkl)} \\
\alpha_1 \sim MVN(0, \Lambda_1) \\
\alpha_2 \sim MVN(0, \Lambda_2) \\
\alpha_3 \sim MVN(0, \Lambda_3)$$

The linear combination of parameters defining $\mu$ above include a fixed global intercept, $a_0$, that is estimated directly from the data, and three separate, latent group-level effects matrices, $\alpha_1, \alpha_2, \alpha_3$. The group effects were assumed to be multivariate normal and are centered on zero in multivariate space. The spread of the effects around zero are determined by a covariance matrix, $\Lambda_1, \Lambda_2, \text{or } \Lambda_3$, which are estimated directly from the data. These covariance terms are further defined where:

$$\Lambda = \begin{pmatrix} 1 & \tau_{p=1} \\ \tau_{p=2} & 1 \end{pmatrix} \chi \begin{pmatrix} 1 & \tau_{p=1} \\ \tau_{p=2} & 1 \end{pmatrix}$$

The $\tau$ parameters are the group-level scale parameters, which constrain the spread of effects for each response, and $\chi$ comprises the group-level residual correlation matrix:

$$\chi = \begin{pmatrix} 1 & \varrho \\ \varrho & 1 \end{pmatrix}$$

wherein $\varrho$ is the group-level residual correlation between responses.

The explicit indexing in the notation above conveys the relationship between the parameters and each observation, $i$, and emphasizes the nested structure of the observations within the group effects. Specifically, every observation, $i$, was nested in a lake size category, $l$, which was nested in a state, $k$, and ecoregion, $j$. The parameter $\alpha_1$, therefore, accounted for ecoregion-scale group effects or deviations from the global mean; $\alpha_2$ accounted for state-level group effects nested in ecoregions; and $\alpha_3$ accounted for lake size group effects within states and ecoregions.  

Finally, the observation-level covariance term, $\Sigma$, was parameterized as:
$$\Sigma = \begin{pmatrix} 1 & \sigma_{p=1} \\ \sigma_{p=2} & 1 \end{pmatrix} \Omega \begin{pmatrix} 1 & \sigma_{p=1} \\ \sigma_{p=2} & 1 \end{pmatrix}$$

wherein the $\sigma$ parameters are the observation-level standard deviations for each response and $\Omega$ comprises the observation-level residual correlation matrix:
$$\Omega = \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}$$
wherein $\rho$ is the residual correlation between responses.

For model fitting, priors were needed for all parameters conditioned directly on the data, which included the global intercept, the scale parameters, and the correlation matrices. A normal or Gaussian prior, $N(\mu = 2, \sigma = 1)$ centered near the (log-scale) data means, was used for the global intercept parameter for each response. This prior was considered minimally informative as it placed most (~80%) of the prior mass over values between approximately 2 and 27 ng/L for median N2O or N2O equilibrium concentration and included support in the tails for values approaching 0 ng/L on the lower end and 80 ng/L on the high end. We placed $Exp(2)$ priors over all scale parameters, which placed most of the support between values very close to 0 and values near 1 (central 80% density interval from approximately 0.005 to 1.15). Finally, for the correlation matrices, an $LKJ(\eta =2)$ prior was used, which, for a 2-dimensional response, placed most support for correlations between approximately -0.9 and 0.9. This prior seemed reasonable as there was no clear causal mechanisms that were thought to ensure a strong direct correlation between the N2O measures. Any potential residual dependence was expected to be indirect due to, for example, a common causal factor (e.g., elevation, temperature). For more information on prior choice recommendations in Stan, see: https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations

The $\textbf{brms}$ package [@Burkner_2017] for $\textbf{R}$ [@R_Core_Team_2021] was used to fit all of the models in a fully Bayesian setting. The formula syntax of the $\textbf{brms}$ package is similar to the syntax used in the $\textbf{lme4}$ package that is widely used to fit mixed effects models in frequentist settings In either package, the linear predictor for $\mu$ described above could be expressed as:

$$\sim 1 + (1|WSA9) + (1|WSA9:state) + (1|WSA9:state:size)$$

In the $\textbf{brms}$ package, there is additionaly functionality and syntax for multivariate responses and for allowing the varying intercepts in a multivariate model to be correlated, e.g.,:

$$ N_2O_{dissolved}\sim 1 + (1|a|WSA9) + (1|b|WSA9:state) + (1|c|WSA9:state:size) \\ 
N_2O_{equilibrium}\sim 1 + (1|a|WSA9) + (1|b|WSA9:state) + (1|c|WSA9:state:size)$$

The above syntax would indicate that the linear predictor for both responses in the multivariate model have the same group-level varying effects, and that each of those effects are allowed to be correlated between responses. 

For the remainder of this document, only this simplified syntax is presented to describe the model parameterizations. For more information on $\textbf{brms}$ functionality and syntax with multivariate response models, the package vignette may be helpful, and can be found at: https://cran.r-project.org/web/packages/brms/vignettes/brms_multivariate.html.

## Model 1
The first model fit was the one described above.
```{r n2o_mod_mv_1, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

bf_n2o <- bf(log(n2o) ~ 1 + 
               (1 | a | WSA9) + 
               (1 | b | WSA9:state) + 
               (1 | c | WSA9:state:size_cat),
             family = gaussian())

bf_n2oeq <- bf(log(n2o_eq) ~ 1 + 
               (1 | a | WSA9) + 
               (1 | b | WSA9:state) +
               (1 | c | WSA9:state:size_cat),
             family = gaussian())

priors <- c(
  prior(normal(2, 1), class = "Intercept", resp = "logn2o"), # centered near data mean
  prior(exponential(2), class = "sd", resp = "logn2o"),
  prior(exponential(2), class = "sigma", resp = "logn2o"),
  prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"), # centered near data mean
  prior(exponential(2), class = "sd", resp = "logn2oeq"),
  prior(exponential(2), class = "sigma", resp = "logn2oeq"),
  prior(lkj(2), class = "rescor"),
  prior(lkj(2), class = "cor")
  )

n2o_mod1 <- brm(bf_n2o + bf_n2oeq + set_rescor(rescor = TRUE),
                data = df_model,
                prior = priors,
                control = list(adapt_delta = 0.99, max_treedepth = 14),
                #sample_prior = "only",
                save_pars = save_pars(all = TRUE),
                seed = 145,
                chains=4, 
                iter=5000, 
                cores=4)

save(n2o_mod1, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
```


### Summarize fit
The summaries of the estimated parameters and key HMC convergence diagnostics for the fitted model are printed below. There were no obvious issues with the HMC sampling. All $\hat{R}$ values were less than 1.01 and effective sample size ($ESS$) calculations suggested that the posterior contained a sufficient number of effective samples for conducting inference.
```{r print_mod1, echo=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")

print(n2o_mod1, prior = T)
```
In the summary above, the estimated standard deviations for the varying group effects on the mean behavior of the dissolved N2O response suggested fairly low, but non-zero variability across each of the three levels. The standard deviations estimated for the same varying effects for equilibrium N2O were also relatively small. Finally, note the relatively small, but positive residual correlation between the two N2O responses. 

Before investing too much into the interpretation of this model, however, the model fit was evaluated below using a series of graphical posterior predictive checks [PPC; @Gelman_etal_2014; @Gelman_etal_2020, Ch. 11].

### Model checks
#### Dissolved N2O
Below are a series of panels illustrating graphical PPCs for the log(N2O) component of the model. The top left panel compares a density plot of the observed data (black line) to density lines drawn for 200 samples from the posterior predictive distribution (PPD; blue lines) of the fitted model. The top right panel similarly compares the cumulative density distributions. The left middle panel simulataneously compares means _vs._ standard deviations for 1000 draws from the PPD (blue dots) to the sample mean and standard deviation (black dot). The right middle panel compares skewness _vs._ kurtosis for 1000 draws from the PPD to the skewness and kurtosis values calculated for the observed data. The bottom left panel compares max _vs._ min values for 1000 draws from the PPD to the max and min values of the sample data. Finally, the bottom right panel shows the observed _vs._ average predicted values for each observation in the sample. The average predicted values were calculated as the mean prediction for each observation in the PPD based on 1000 draws.
```{r ppc_n2o1, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
grid.arrange(
pp_check(n2o_mod1, 
         resp = "logn2o",
         type = "dens_overlay",
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("density")
,
pp_check(n2o_mod1, 
         resp = "logn2o",
         type = "ecdf_overlay",
         ndraws = 200, 
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod1, 
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("mean", "sd"),
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("min", "max"), 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         resp = "logn2o",
         type = "scatter_avg",
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```
The general takeaway from the PPCs above was that the model replicated the central tendency of the observed data fairly well, but failed to sufficiently replicate other important aspects of the distribution, such as skewness and kurtosis. The observed _vs._ average predictions scatterplot suggested substantial heteroscedasticity in the errors. 

The same checks were run below, but for the test set of 95 held-out, second-visit data points. 
```{r ppc_n2o1_test, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_test.rda")
grid.arrange(
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2o",
         type = "dens_overlay",
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("density")
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2o",
         type = "ecdf_overlay",
         ndraws = 200, 
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("mean", "sd"),
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("min", "max"), 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2o",
         type = "scatter_avg",
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```
The patterns in misfit indicated above for the re-visit data were similar to the patterns indicated in the PPCs with the training data.

#### Equilibrium N2O
Below are PPCs for the equilibrium N2O component of the model. As with the dissolved N2O response above, the model did an OK job at replicating the central tendency, but performed less well at replicating some important aspects of the overall distribution. 
```{r ppc_n2oeq1, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
grid.arrange(
pp_check(n2o_mod1, 
         resp = "logn2oeq",
         type = "dens_overlay",
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("density")
,
pp_check(n2o_mod1, 
         resp = "logn2oeq",
         type = "ecdf_overlay",
         ndraws = 200, 
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod1, 
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("mean", "sd"),
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"),
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("min", "max"), 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         resp = "logn2oeq",
         type = "scatter_avg", 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```

Below are the same PPCs for equilibrium N2O in the re-visit sites.
```{r ppc_n2oeq1_test, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
grid.arrange(
pp_check(n2o_mod1,
         newdata = df_test,
         resp = "logn2oeq",
         type = "dens_overlay",
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("density")
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2oeq",
         type = "ecdf_overlay",
         ndraws = 200, 
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("mean", "sd"),
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"),
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("min", "max"), 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2oeq",
         type = "scatter_avg", 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```
#### Bivariate
The graphical check below compares bivariate density contours estimated from the observed data (black lines) to density contours estimated for each of 20 draws from the PPD. The model appeared to do a good job of replicating the bivariate mean, but was poor at representing the overall joint distribution.
```{r ppc_biv1, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
df_model %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 20) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  ggplot(aes(x = log(n2o), y = log(n2o_eq))) +
  geom_density_2d(aes(x = logn2o, 
                      y = logn2oeq, 
                      group = .draw),
                  bins = 10,
                  color = "lightblue", 
                  alpha = 0.4) +
  geom_density_2d(color = "black", bins = 10) +
  xlim(1.25, 2.75) +
  ylim(1.25, 2.75) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
```

The same bivariate check is shown below for the re-visit data.
```{r ppc_biv1_test, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
df_test %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 20) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  ggplot(aes(x = log(n2o), y = log(n2o_eq))) +
  geom_density_2d(aes(x = logn2o, 
                      y = logn2oeq, 
                      group = .draw),
                  bins = 10,
                  color = "lightblue", 
                  alpha = 0.4) +
  geom_density_2d(color = "black", bins = 10) +
  xlim(1.25, 2.75) +
  ylim(1.25, 2.75) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
```

#### Saturation
The graphical PPCs below were aimed at evaluating how well the multivariate model did at representing the observed saturation ratio: 
$$N_2O_{dissolved}:N_2O_{equilibrium}$$
This quantity was estimated as a derived variable by simply dividing the N2O PPD by the equilibrium N2O PPD. Likewise, the proportion of under-saturated lakes in the sample was estimated by summing the number of lakes from each posterior predictive draw wherein the ratio was < 1 and dividing that number by the total number of lakes in the sample, which was 984.
Overall, these checks indicated that properly representing the tails of the N2O and N2O-eq observations would likely be necessary in order to better replicate the observed saturation metrics. For example, the model did a poor job replicating the observed proportion of under-saturated lakes, underestimating it by more than 10 percentage points, on average.
```{r ppc_sat1, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
grid.arrange(
df_model %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 50) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  ggplot(aes(x = sat_ratio)) +
  geom_density(aes(x = sat_pred, group = .draw), 
               n = 1024, 
               adjust = 1,
               color = "lightblue") +
  geom_density(n = 1024, adjust = 2) +
  xlim(0, 5) +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(usat_pred = ifelse(logn2o < logn2oeq, 1, 0)) %>%
  group_by(.draw) %>%
  summarise(prop_pred = sum(usat_pred) / 984) %>%
  ggplot(aes(x = prop_pred)) +
  geom_histogram(binwidth = 0.001, fill = "lightblue") +
  geom_vline(data = df_model, mapping = aes(xintercept = sum(n2o < n2o_eq)/984)) +
  xlab("Proportion undersaturated waterbodies") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(mean_yrep = mean(sat_pred),
         sd_yrep = sd(sat_pred)) %>% 
  ungroup() %>%
  mutate(mean_y = mean(sat_ratio),
         sd_y = sd(sat_ratio)) %>%
  ggplot(aes(x = mean_yrep, y = sd_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = mean_y), linetype = "dashed") +
  geom_hline(aes(yintercept = sd_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(min_yrep = min(sat_pred),
         max_yrep = max(sat_pred)) %>% 
  ungroup() %>%
  mutate(min_y = min(sat_ratio),
         max_y = max(sat_ratio)) %>%
  ggplot(aes(x = min_yrep, y = max_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = min_y), linetype = "dashed") +
  geom_hline(aes(yintercept = max_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.row) %>%
  mutate(mean_yrep = mean(sat_pred)) %>% 
  filter(.draw == 1) %>% 
  ggplot(aes(x = mean_yrep, y = sat_ratio)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
, ncol = 2)
```

The top left panel, above, is a density plot of the observed saturation ratio (black line) compared to an estimate using 50 draws from the model (blue lines). The top right panel shows the observed proportion of under-saturated lakes compared to a model estimate based on 1000 draws from the PPD. The left middle panel shows the mean _vs._ standard deviation of the saturation ratio for the observed data compared to the same estimates for 500 posterior draws from the model's PPD. The right middle panel shows the max _vs._ min for the sample compared to 500 draws from the model's PPD. Finally, the bottom left panel shows the observed _vs._ average predicted saturation ratio for all 984 lakes sampled in the dataset.

The same PPCs are show below for the revisit data.
```{r ppc_sat1_test, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
grid.arrange(
df_test %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 50) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  ggplot(aes(x = sat_ratio)) +
  geom_density(aes(x = sat_pred, group = .draw), 
               n = 1024, 
               adjust = 1,
               color = "lightblue") +
  geom_density(n = 1024, adjust = 2) +
  xlim(0, 5) +
  theme_tidybayes()
,
df_test %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(usat_pred = ifelse(logn2o < logn2oeq, 1, 0)) %>%
  group_by(.draw) %>%
  summarise(prop_pred = sum(usat_pred) / 95) %>%
  ggplot(aes(x = prop_pred)) +
  geom_histogram(binwidth = 0.001, fill = "lightblue") +
  geom_vline(data = df_test, mapping = aes(xintercept = sum(n2o < n2o_eq)/95)) +
  xlab("Proportion undersaturated waterbodies") +
  theme_tidybayes()
,
df_test %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(mean_yrep = mean(sat_pred),
         sd_yrep = sd(sat_pred)) %>% 
  ungroup() %>%
  mutate(mean_y = mean(sat_ratio),
         sd_y = sd(sat_ratio)) %>%
  ggplot(aes(x = mean_yrep, y = sd_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = mean_y), linetype = "dashed") +
  geom_hline(aes(yintercept = sd_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_test %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(min_yrep = min(sat_pred),
         max_yrep = max(sat_pred)) %>% 
  ungroup() %>%
  mutate(min_y = min(sat_ratio),
         max_y = max(sat_ratio)) %>%
  ggplot(aes(x = min_yrep, y = max_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = min_y), linetype = "dashed") +
  geom_hline(aes(yintercept = max_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_test %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.row) %>%
  mutate(mean_yrep = mean(sat_pred)) %>% 
  filter(.draw == 1) %>% 
  ggplot(aes(x = mean_yrep, y = sat_ratio)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
, ncol = 2)
```
The checks above indicated that the model did a similarly underwhelming job of replicating some key properties of the saturation metrics calculated from the re-visit data.

#### R-square
Below, the Bayesian $R^2$ values are reported for each reasponse in the model. 
```{r r2_1, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=4, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
round(bayes_R2(n2o_mod1, resp = "logn2o", cores = 1), 3)
round(bayes_R2(n2o_mod1, resp = "logn2oeq", cores = 1), 3)
```

The $R^2$ were also estimated for the re-visit data.
```{r r2_1_test, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=4, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
round(bayes_R2(n2o_mod1, resp = "logn2o", newdata = df_test, cores = 1), 3)
round(bayes_R2(n2o_mod1, resp = "logn2oeq", newdata = df_test, cores = 1), 3)
```

## Model 2
In an attempt to better fit the observed data, the next model included distributional sub-models to allow for heterogeneous variances for each response conditional on the survey design structure.
```{r n2o_mod_2, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

bf_n2o <- bf(log(n2o) ~ 1 +
               (1 | a | WSA9) + 
               (1 | b | WSA9:state) + 
               (1 | c | WSA9:state:size_cat),
             sigma ~ 1 +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat), 
             family = gaussian())

bf_n2oeq <- bf(log(n2o_eq) ~ 1 +
                 (1 | a | WSA9) + 
                 (1 | b | WSA9:state) +
                 (1 | c | WSA9:state:size_cat),
             sigma ~ 1 +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat),
             family = gaussian())

priors <- c(
  prior(normal(2, 1), class = "Intercept", resp = "logn2o"),
  prior(exponential(2), class = "sd", resp = "logn2o"),
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2o"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2o"),
  prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"), 
  prior(exponential(2), class = "sd", resp = "logn2oeq"),
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2oeq"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2oeq"),
  
  prior(lkj(2), class = "rescor"),
  prior(lkj(2), class = "cor")
  )

n2o_mod2 <- brm(bf_n2o + bf_n2oeq + set_rescor(rescor = TRUE),
                data = df_model, 
                prior = priors,
  control = list(adapt_delta = 0.975, max_treedepth = 12),
  #sample_prior = "only",
  save_pars = save_pars(all = TRUE),
  seed = 84512,
  chains=4, 
  iter=5000, 
  cores=4)

save(n2o_mod2, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod2.rda")
```

### Summarize fit
The summaries of the estimated parameters and key HMC convergence diagnostics for the fitted model are printed below. 
```{r print_mod2, echo=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod2.rda")

print(n2o_mod2, prior = T)
```
From the summary above, note the moderate and positive residual correlation between the two N2O responses. The estimated standard deviations for the varying group effects on the mean behavior of the dissolved N2O response suggested fairly low, but non-zero variability across each of the three levels. The standard deviations estimated for the same varying effects for equilibrium N2O were also relatively small. However, before investing too much into the interpretation of these results, the model fit was further evaluated below using a series of graphical posterior predictive checks (PPCs).

### Model checks
Below the same PPCs were performed as with the initial model (see above for more details on each panel). 
##### Dissolved N2O
Though the checks below suggest some improvement in replicating the tails of the observed data, this model did a poorer job at replicating central tendency.
```{r ppc_n2o2, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod2.rda")
grid.arrange(
pp_check(n2o_mod2, 
         resp = "logn2o",
         type = "dens_overlay",
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlim(-5, 5) +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("density")
,
pp_check(n2o_mod2, 
         resp = "logn2o",
         type = "ecdf_overlay",
         ndraws = 200, 
         cores = 1) + 
  theme_tidybayes() +
  xlim(-5, 5) +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod2, 
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("mean", "sd"),
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod2, 
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod2, 
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("min", "max"), 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod2, 
         resp = "logn2o",
         type = "scatter_avg", 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```

#### Equilibrium N2O
The checks below suggest this model offered no improvement upon the initial model for equilibrium N2O. This model also appeared to do a poorer job of replicating the mean and overall standard deviation compared to the initial model.
```{r ppc_n2oeq2, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod2.rda")
grid.arrange(
pp_check(n2o_mod2, 
         resp = "logn2oeq",
         type = "dens_overlay",
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O)"))) + 
  ylab("density")
,
pp_check(n2o_mod2, 
         resp = "logn2oeq",
         type = "ecdf_overlay",
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O)"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod2, 
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("mean", "sd"), 
         ndraws = 1000,
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod2, 
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"),
         ndraws = 1000,
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod2, 
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("min", "max"), 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod2, 
         resp = "logn2oeq",
         type = "scatter_avg", 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```

#### Bivariate
This check perhaps suggested an improvement with regard to replicating the joint density. However, the predictions were still clearly over-dispersed relative to the observations.
```{r ppc_biv2, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod2.rda")
df_model %>%
  add_predicted_draws(n2o_mod2, 
                      ndraws = 20) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  ggplot(aes(x = log(n2o_obs), y = log(n2oeq_obs))) +
  geom_density_2d(aes(x = logn2o, 
                      y = logn2oeq, 
                      group = .draw),
                  bins = 10,
                  color = "lightblue", 
                  alpha = 0.4) +
  geom_density_2d(color = "black", bins = 10) +
  xlim(1.25, 2.75) +
  ylim(1.25, 2.75) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
```

#### Saturation
The PPCs for the saturation metrics below indicated that including the distributional models was perhaps an improvement on the initial model in some aspects; in particular, the bias in the predicted proportion of under-saturated lakes was substantially decreased. However, there appeared to still be issues in replicating the tails as well as issues with central tendency.
```{r ppc_sat2, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod2.rda")
grid.arrange(
df_model %>%
  add_predicted_draws(n2o_mod2, 
                      ndraws = 50) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  ggplot(aes(x = sat_ratio)) +
  geom_density(aes(x = sat_pred, group = .draw), 
               n = 1024, 
               adjust = 1,
               color = "lightblue") +
  geom_density(n = 1024, adjust = 2) +
  xlim(0, 5) +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod2, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(usat_pred = ifelse(logn2o < logn2oeq, 1, 0)) %>%
  group_by(.draw) %>%
  summarise(prop_pred = sum(usat_pred) / 984) %>%
  ggplot(aes(x = prop_pred)) +
  geom_histogram(binwidth = 0.001, fill = "lightblue") +
  geom_vline(data = df_model, mapping = aes(xintercept = sum(n2o < n2o_eq)/984)) +
  xlab("Proportion undersaturated waterbodies") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod2, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(mean_yrep = mean(sat_pred),
         sd_yrep = sd(sat_pred)) %>% 
  ungroup() %>%
  mutate(mean_y = mean(sat_ratio),
         sd_y = sd(sat_ratio)) %>%
  ggplot(aes(x = mean_yrep, y = sd_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = mean_y), linetype = "dashed") +
  geom_hline(aes(yintercept = sd_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod2, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(min_yrep = min(sat_pred),
         max_yrep = max(sat_pred)) %>% 
  ungroup() %>%
  mutate(min_y = min(sat_ratio),
         max_y = max(sat_ratio)) %>%
  ggplot(aes(x = min_yrep, y = max_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = min_y), linetype = "dashed") +
  geom_hline(aes(yintercept = max_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod2, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.row) %>%
  mutate(mean_yrep = mean(sat_pred)) %>% 
  filter(.draw == 1) %>% 
  ggplot(aes(x = mean_yrep, y = sat_ratio)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
, ncol = 2)
```

#### R-square
Relative to model 1, there was a substantial decrease in the $R^2$ estimate for the dissolved N2O component of this model. The estimate for the equilibrium N2O-eq component was similar to the model 1. 
```{r r2_2, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=4, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod2.rda")
round(bayes_R2(n2o_mod2, resp = "logn2o", cores = 1), 3) 
round(bayes_R2(n2o_mod2, resp = "logn2oeq", cores = 1), 3)
```

## Model 3
In the next model, we used covariates to try to improve the fit. The categorical version of the NO3 covariate was used as a monotonic ordinal predictor in the dissolved N2O component of the modl. For the equlibrium N2O component, we included surface temperature and log-transformed elevation, along with their interaction. The models also retained the distributional specifications included in model 2 above.
```{r n2o_mod_3, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

bf_n2o <- bf(log(n2o) ~ mo(no3_cat) +
               surftemp +
               (mo(no3_cat) | a | WSA9) + 
               (mo(no3_cat) | b | WSA9:state) + 
               (1 | c | WSA9:state:size_cat),
             sigma ~ 1 +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat), 
             family = gaussian())

bf_n2oeq <- bf(log(n2o_eq) ~ surftemp +
                 log_elev +
                 surftemp:log_elev +
                 (1 | a | WSA9) + 
                 (1 | b | WSA9:state) +
                 (1 | c | WSA9:state:size_cat),
             sigma ~ 1 +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat),
             family = gaussian())

priors <- c(
  prior(normal(2, 1), class = "Intercept", resp = "logn2o"),
  prior(normal(0, 1), class = "b", resp = "logn2o"),
  prior(exponential(2), class = "sd", resp = "logn2o"),
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2o"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2o"),
  prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"), 
  prior(normal(0, 1), class = "b", resp = "logn2oeq"), 
  prior(exponential(2), class = "sd", resp = "logn2oeq"), 
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2oeq"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2oeq"),
  
  prior(lkj(2), class = "rescor"),
  prior(lkj(2), class = "cor")
  )

n2o_mod3 <- brm(bf_n2o + bf_n2oeq + set_rescor(rescor = TRUE),
                data = df_model, 
                prior = priors,
  control = list(adapt_delta = 0.975, max_treedepth = 12),
  #sample_prior = "only",
  save_pars = save_pars(all = TRUE),
  seed = 98456,
  chains=4, 
  iter=5000, 
  cores=4)

save(n2o_mod3, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")
```

### Summarize fit
The fitted parameters and MCMC diagnostics are below.
```{r print_mod3, echo=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")

print(n2o_mod3, prior = T)
```

### Model checks
#### Dissolved N2O
The PPCs below indicated a better fit compared to the previous models. The central tendency and tail behavior looked to be reasonably replicated by comparison. However, the observed _vs._ predicted plot suggested that larger overserved values were being systematically underestimated.
```{r ppc_full_checks_mod_n2o3, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")
grid.arrange(
pp_check(n2o_mod3, 
         resp = "logn2o",
         type = "dens_overlay",
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlim(0, 5) +
  xlab(expression(paste("log(N"[2],"O)"))) + 
  ylab("density")
,
pp_check(n2o_mod3, 
         resp = "logn2o",
         type = "ecdf_overlay",
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlim(0, 5) +
  xlab(expression(paste("log(N"[2],"O)"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod3, 
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("mean", "sd"), 
         ndraws = 1000,
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod3, 
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"),  
         ndraws = 1000,
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod3, 
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("min", "max"),  
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod3, 
         resp = "logn2o",
         type = "scatter_avg",  
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```

#### Equilibrium N2O
The PPCs below indicated that this model appeared to be an improvement for equilibrium N2O as well. However, some checks (e.g., skewness) suggested some room for additional improvement.
```{r ppc_full_checks_mod_n2oeq3, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")
grid.arrange(
pp_check(n2o_mod3, 
         resp = "logn2oeq",
         type = "dens_overlay", 
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O)"))) + 
  ylab("density")
,
pp_check(n2o_mod3, 
         resp = "logn2oeq",
         type = "ecdf_overlay", 
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O)"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod3, 
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("mean", "sd"),  
         ndraws = 1000,
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod3, 
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         ndraws = 1000, 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod3, 
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("min", "max"),  
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod3, 
         resp = "logn2oeq",
         type = "scatter_avg",  
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```

#### Bivariate
The check for the joint distribution below also suggested an improvement up the previous models.
```{r ppc_bv_check_mod_n2o3, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")
df_model %>%
  add_predicted_draws(n2o_mod3, 
                      ndraws = 20) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  ggplot(aes(x = log(n2o_obs), y = log(n2oeq_obs))) +
  geom_density_2d(aes(x = logn2o, 
                      y = logn2oeq, 
                      group = .draw),
                  bins = 10,
                  color = "lightblue", 
                  alpha = 0.4) +
  geom_density_2d(color = "black", bins = 10) +
  xlim(1.25, 2.75) +
  ylim(1.25, 2.75) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
```

#### Saturation
This model looked to be an improvement with regard to the PPCs for the saturation metrics. However, the proportion of under-saturated lakes remained biased low and other checks indicated that further improvements would be ideal.
```{r ppc_sat_check_mod_n2o3, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")
grid.arrange(
df_model %>%
  add_predicted_draws(n2o_mod3, 
                      ndraws = 50) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  ggplot(aes(x = sat_ratio)) +
  geom_density(aes(x = sat_pred, group = .draw), 
               n = 1024, 
               adjust = 1,
               color = "lightblue") +
  geom_density(n = 1024, adjust = 2) +
  xlim(0, 5) +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod3, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(usat_pred = ifelse(exp(logn2o) < exp(logn2oeq), 1, 0)) %>%
  group_by(.draw) %>%
  summarise(prop_pred = sum(usat_pred) / 984) %>%
  ggplot(aes(x = prop_pred)) +
  geom_histogram(binwidth = 0.001, fill = "lightblue") +
  geom_vline(data = df_model, mapping = aes(xintercept = sum(n2o < n2o_eq)/984)) +
  xlab("Proportion undersaturated waterbodies") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod3, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(mean_yrep = mean(sat_pred),
         sd_yrep = sd(sat_pred)) %>% 
  ungroup() %>%
  mutate(mean_y = mean(sat_ratio),
         sd_y = sd(sat_ratio)) %>%
  ggplot(aes(x = mean_yrep, y = sd_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = mean_y), linetype = "dashed") +
  geom_hline(aes(yintercept = sd_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod3, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(min_yrep = min(sat_pred),
         max_yrep = max(sat_pred)) %>% 
  ungroup() %>%
  mutate(min_y = min(sat_ratio),
         max_y = max(sat_ratio)) %>%
  ggplot(aes(x = min_yrep, y = max_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = min_y), linetype = "dashed") +
  geom_hline(aes(yintercept = max_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod3, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.row) %>%
  mutate(mean_yrep = mean(sat_pred)) %>% 
  filter(.draw == 1) %>% 
  ggplot(aes(x = mean_yrep, y = sat_ratio)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
, ncol = 2)
```

#### R-square
The $R^2$ estimates for this model are below and suggested substantial improvements on the previous models.
```{r r2_mod_n2o3, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=4, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")
round(bayes_R2(n2o_mod3, resp = "logn2o", cores = 1), 3) 
round(bayes_R2(n2o_mod3, resp = "logn2oeq", cores = 1), 3)
```

### Covariate effects
Below are plots illustrating the modeled effects of covariates on both N2O and equilibrium N2O.
#### N2O
The conditional effects plots below for N2O illustrate a positive, monotonic, and non-linear relationship between NO3 and N2O; and a negative, linear relationship between surface temperature and N2O.
```{r conditional_effects_mod_n2o3, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=3}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")

p1 <- conditional_effects(n2o_mod3, 
                          resp = "logn2o", 
                          effects = c("no3_cat"), 
                          plot = F)
p2 <- conditional_effects(n2o_mod3, 
                          resp = "logn2o", 
                          effects = c("surftemp"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]], 
          plot(p2, plot = F)[[1]],
          ncol = 2)

rm(p1, p2)

annotate_figure(plt, top = text_grob("N2O: conditional effects", 
               color = "black", face = "bold", size = 14))
```

#### Equilibrium N2O
The modeled effects below for the equilibrium N2O component of the model illustrated a negative relationship between equilibrium N2O and both predictors and an interaction such that the surface temperature effect became slightly steeper at lower elevations.
```{r conditional_effects_mod_n2oeq3, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=3}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")
p1 <- conditional_effects(n2o_mod3, 
                          resp = "logn2oeq", 
                          effects = c("log_elev:surftemp"), 
                          plot = F)
p2 <- conditional_effects(n2o_mod3, 
                          resp = "logn2oeq", 
                          effects = c("surftemp:log_elev"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, top = text_grob("Equilibrium N2O: conditional effects", 
               color = "black", face = "bold", size = 14))
```

## Model 4
In the next model, covariate terms were also included in the $\sigma$ components of both models in order to try to better capture remaining heterogeneity in the variances of both N2O and N2O-eq.
```{r n2o_mod_mv_4, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

bf_n2o <- bf(log(n2o) ~ mo(no3_cat) +
               surftemp +
               (mo(no3_cat) | a | WSA9) + 
               (mo(no3_cat) | b | WSA9:state) + 
               (1 | c | WSA9:state:size_cat),
             sigma ~ mo(no3_cat) +
               surftemp +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat), 
             family = gaussian())

bf_n2oeq <- bf(log(n2o_eq) ~ surftemp +
                 log_elev +
                 surftemp:log_elev +
                 (1 | a | WSA9) + 
                 (1 | b | WSA9:state) +
                 (1 | c | WSA9:state:size_cat),
             sigma ~ surftemp +
               log_elev +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat),
             family = gaussian())

priors <- c(
  prior(normal(2, 1), class = "Intercept", resp = "logn2o"),
  prior(normal(0, 1), class = "b", resp = "logn2o"),
  prior(exponential(2), class = "sd", resp = "logn2o"),
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2o"),
  prior(normal(0, 1), class = "b", dpar = "sigma", resp = "logn2o"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2o"),
  prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"), 
  prior(normal(0, 1), class = "b", resp = "logn2oeq"), 
  prior(exponential(2), class = "sd", resp = "logn2oeq"), 
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2oeq"),
  prior(normal(0, 1), class = "b", dpar = "sigma", resp = "logn2oeq"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2oeq"),
  
  prior(lkj(2), class = "rescor"),
  prior(lkj(2), class = "cor")
  )

n2o_mod4 <- brm(bf_n2o + bf_n2oeq + set_rescor(rescor = TRUE),
                data = df_model, 
                prior = priors,
  control = list(adapt_delta = 0.975, max_treedepth = 12),
  #sample_prior = "only",
  save_pars = save_pars(all = TRUE),
  seed = 15851,
  chains=4, 
  iter=5000, 
  cores=4)

save(n2o_mod4, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")
```

### Summarize fit
Below is a summary of the fitted parameters along with some convergence diagnostics.
```{r print_mod4, echo=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")

print(n2o_mod4, prior = T, digits = 3)
```

### Model checks
Again, the same PPCs were employed for this model as above.
#### Dissolved N2O
Again, this model appeared to be an improvement on the previous model, particularly with regard to the more constant variance indicated in the observed _vs._ predicted plot (bottom, right panel).
```{r ppc_full_checks_mod_n2o4, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")
grid.arrange(
pp_check(n2o_mod4, 
         ndraws = 200,
         resp = "logn2o",
         type = "dens_overlay",
         nsamples = 40,
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O)"))) + 
  ylab("density")
,
pp_check(n2o_mod4, 
         ndraws = 200,
         resp = "logn2o",
         type = "ecdf_overlay",
         nsamples = 40, 
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O)"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod4, 
         ndraws = 1000,
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("mean", "sd"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod4, 
         ndraws = 1000,
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod4, 
         ndraws = 1000,
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("min", "max"), 
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod4, 
         resp = "logn2o",
         type = "scatter_avg", 
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```

#### Equilibrium N2O
This component of the model also seemed to be an improvement over model 3, with better representation in the tails as indicated in the skewness _vs._ kurtosis PPC. 
```{r ppc_full_checks_mod_n2oeq4, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")
grid.arrange(
pp_check(n2o_mod4, 
         ndraws = 200,
         resp = "logn2oeq",
         type = "dens_overlay",
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O)"))) + 
  ylab("density")
,
pp_check(n2o_mod4, 
         ndraws = 200,
         resp = "logn2oeq",
         type = "ecdf_overlay",
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O)"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod4, 
         ndraws = 1000,
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("mean", "sd"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod4, 
         ndraws = 1000,
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod4, 
         ndraws = 1000,
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("min", "max"), 
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod4, 
         ndraws = 1000,
         resp = "logn2oeq",
         type = "scatter_avg", 
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```

#### Bivariate
Again, an improvement over the previous model with a tighter fit of the PPC to the observed bivariate density.
```{r ppc_bv_check_mod_n2o4, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")
df_model %>%
  add_predicted_draws(n2o_mod4, 
                      ndraws = 20) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  ggplot(aes(x = log(n2o_obs), y = log(n2oeq_obs))) +
  geom_density_2d(aes(x = logn2o, 
                      y = logn2oeq, 
                      group = .draw),
                  bins = 10,
                  color = "lightblue", 
                  alpha = 0.4) +
  geom_density_2d(color = "black", bins = 10) +
  xlim(1.25, 2.75) +
  ylim(1.25, 2.75) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
```

#### Saturation
This check also suggested an improvement over the previous models, with better tail behavior and less bias in the proportion under-saturated measure.
```{r ppc_sat_check_mod_n2o4, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")
grid.arrange(
df_model %>%
  add_predicted_draws(n2o_mod4, 
                      ndraws = 50) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  ggplot(aes(x = sat_ratio)) +
  geom_density(aes(x = sat_pred, group = .draw), 
               n = 1024, 
               adjust = 1,
               color = "lightblue") +
  geom_density(n = 1024, adjust = 2) +
  xlim(0, 5) +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod4, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(usat_pred = ifelse(exp(logn2o) < exp(logn2oeq), 1, 0)) %>%
  group_by(.draw) %>%
  summarise(prop_pred = sum(usat_pred) / 984) %>%
  ggplot(aes(x = prop_pred)) +
  geom_histogram(binwidth = 0.001, fill = "lightblue") +
  geom_vline(data = df_model, mapping = aes(xintercept = sum(n2o < n2o_eq)/984)) +
  xlab("Proportion undersaturated waterbodies") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod4, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(mean_yrep = mean(sat_pred),
         sd_yrep = sd(sat_pred)) %>% 
  ungroup() %>%
  mutate(mean_y = mean(sat_ratio),
         sd_y = sd(sat_ratio)) %>%
  ggplot(aes(x = mean_yrep, y = sd_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = mean_y), linetype = "dashed") +
  geom_hline(aes(yintercept = sd_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod4, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(min_yrep = min(sat_pred),
         max_yrep = max(sat_pred)) %>% 
  ungroup() %>%
  mutate(min_y = min(sat_ratio),
         max_y = max(sat_ratio)) %>%
  ggplot(aes(x = min_yrep, y = max_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = min_y), linetype = "dashed") +
  geom_hline(aes(yintercept = max_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod4, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.row) %>%
  mutate(mean_yrep = mean(sat_pred)) %>% 
  filter(.draw == 1) %>% 
  ggplot(aes(x = mean_yrep, y = sat_ratio)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
, ncol = 2)
```

#### R-square
The Bayesian $R^2$ estimates below indicated an improvement from the previous models.
```{r r2_mod_n2o4, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=4, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")
round(bayes_R2(n2o_mod4, resp = "logn2o", cores = 1), 3) 
round(bayes_R2(n2o_mod4, resp = "logn2oeq", cores = 1), 3)
```

### Covariate effects
#### N2O
The conditional effects plots for the covariate effects on N2O remained largely unchanged from the previous model.
```{r conditional_effects_mod_n2o4, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")

p1 <- conditional_effects(n2o_mod4, 
                          resp = "logn2o", 
                          effects = c("no3_cat"), 
                          plot = F)
p2 <- conditional_effects(n2o_mod4, 
                          resp = "logn2o", 
                          effects = c("surftemp"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]], 
          plot(p2, plot = F)[[1]],
          ncol = 2)

rm(p1, p2)

annotate_figure(plt, top = text_grob(expression(paste("N2O: covariate effects on ", mu)), 
               color = "black", face = "bold", size = 14))
```

Below are estimates of the conditional effects of the covariates on $\sigma$ for N2O. These plots suggested a large effect of NO3 on the variance of N2O, but little to no effect of surface temperature.
```{r conditional_effects_sigma_n2o4, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")

p1 <- conditional_effects(n2o_mod4, 
                          resp = "logn2o",
                          dpar = "sigma",
                          effects = c("no3_cat"), 
                          plot = F)
p2 <- conditional_effects(n2o_mod4, 
                          resp = "logn2o",
                          dpar = "sigma",
                          effects = c("surftemp"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, 
                top = text_grob(expression(paste("N2O: covariate effects on ", sigma)),
                                color = "black",
                                face = "bold",
                                size = 14))
```

#### Equilibrium N2O
The covariate effects on N2O remained largely the same as for the previous model.
```{r conditional_effects_mod_n2oeq4, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")

p1 <- conditional_effects(n2o_mod4, 
                          resp = "logn2oeq", 
                          effects = c("log_elev:surftemp"), 
                          plot = F)
p2 <- conditional_effects(n2o_mod4, 
                          resp = "logn2oeq", 
                          effects = c("surftemp:log_elev"), 
                          plot = F)
plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, top = text_grob(expression(paste("Equilibrium N2O: covariate effects on ", mu)), 
               color = "black", face = "bold", size = 14))
```

The covariate effects on $\sigma$ for N2O-eq suggested an negative effect of surface temperature and litte to no effect of elevation.
```{r conditional_effects_sigma_n2oeq4, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")

p1 <- conditional_effects(n2o_mod4, 
                          resp = "logn2oeq",
                          dpar = "sigma",
                          effects = c("surftemp"), 
                          plot = F)
p2 <- conditional_effects(n2o_mod4, 
                          resp = "logn2oeq",
                          dpar = "sigma",
                          effects = c("log_elev"), 
                          plot = F)
plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, top = text_grob(expression(paste("Equilibrium N2O: covariate effects on ", sigma)), 
               color = "black", face = "bold", size = 14))
```

## Model 5
In the next model, more complexity is added to the N2O component by including a covariate for lake surface area (log scale) as well as interactions between NO3 and log(area) and surface temperature. 
```{r n2o_mod_mv_5, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

bf_n2o <- bf(log(n2o) ~ mo(no3_cat) +
               log_area +
               surftemp + 
               mo(no3_cat):log_area +
               mo(no3_cat):surftemp +
               (mo(no3_cat) | a | WSA9) + 
               (mo(no3_cat) | b | WSA9:state) + 
               (1 | c | WSA9:state:size_cat),
             sigma ~ log_area +
               mo(no3_cat) +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat), 
             family = gaussian())

bf_n2oeq <- bf(log(n2o_eq) ~ surftemp +
                 log_elev +
                 surftemp:log_elev +
                 (1 | a | WSA9) + 
                 (1 | b | WSA9:state) +
                 (1 | c | WSA9:state:size_cat),
             sigma ~ surftemp +
               log_elev +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat),
             family = gaussian())

priors <- c(
  prior(normal(2, 1), class = "Intercept", resp = "logn2o"),
  prior(normal(0, 1), class = "b", resp = "logn2o"),
  prior(exponential(2), class = "sd", resp = "logn2o"),
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2o"),
  prior(normal(0, 1), class = "b", dpar = "sigma", resp = "logn2o"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2o"),
  
  prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"), 
  prior(normal(0, 1), class = "b", resp = "logn2oeq"), 
  prior(exponential(2), class = "sd", resp = "logn2oeq"), 
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2oeq"),
  prior(normal(0, 1), class = "b", dpar = "sigma", resp = "logn2oeq"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2oeq"),
  
  prior(lkj(2), class = "rescor"),
  prior(lkj(2), class = "cor")
  )

n2o_mod5 <- brm(bf_n2o + 
                  bf_n2oeq +
                  set_rescor(rescor = TRUE),
                data = df_model, 
                prior = priors,
  control = list(adapt_delta = 0.975, max_treedepth = 12),
  #sample_prior = "only",
  save_pars = save_pars(all = TRUE),
  seed = 54741,
  chains=4, 
  iter=5000, 
  cores=4)

save(n2o_mod5, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")
```

### Summarize fit
Below is a summary of the fitted parameters along with MCMC convergence diagnostics.
```{r print_mod5, echo=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")

print(n2o_mod5, prior = T, digits = 3)
```

### Model checks
Again, the same PPCs as above were performed for this model.
#### N2O PPC
This PPC for N2O looked similar to the previous model.
```{r ppc_full_checks_mod_n2o5, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")
grid.arrange(
pp_check(n2o_mod5, 
         ndraws = 200,
         resp = "logn2o",
         type = "dens_overlay",
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("density")
,
pp_check(n2o_mod5, 
         ndraws = 200,
         resp = "logn2o",
         type = "ecdf_overlay",
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod5, 
         ndraws = 1000,
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("mean", "sd"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod5, 
         ndraws = 1000,
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod5, 
         ndraws = 1000,
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("min", "max"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod5, 
         ndraws = 1000,
         resp = "logn2o",
         type = "scatter_avg", 
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```

#### Equilibrium N2O
Again, the PPCs for this model were similar to the previous model, which was unsurprising given that it was the same model for N2O-eq.
```{r ppc_full_checks_mod_n2oeq5, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")
grid.arrange(
pp_check(n2o_mod5, 
         ndraws = 200,
         resp = "logn2oeq",
         type = "dens_overlay",
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O) concentration"))) + 
  ylab("density")
,
pp_check(n2o_mod5, 
         ndraws = 200,
         resp = "logn2oeq",
         type = "ecdf_overlay",
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O) concentration"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod5, 
         ndraws = 1000,
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("mean", "sd"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod5, 
         ndraws = 1000,
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod5, 
         ndraws = 1000,
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("min", "max"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod5, 
         ndraws = 1000,
         resp = "logn2oeq",
         type = "scatter_avg", 
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```

#### Bivariate
This PPC was also similar to the previous model.
```{r ppc_bv_check_mod_n2o5, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")
df_model %>%
  add_predicted_draws(n2o_mod5, 
                      ndraws = 20) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  ggplot(aes(x = log(n2o_obs), y = log(n2oeq_obs))) +
  geom_density_2d(aes(x = logn2o, 
                      y = logn2oeq, 
                      group = .draw),
                  bins = 10,
                  color = "lightblue", 
                  alpha = 0.4) +
  geom_density_2d(color = "black", bins = 10) +
  xlim(1.25, 2.75) +
  ylim(1.25, 2.75) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
```

#### Saturation
This check was also similar to the prevoius model, with perhaps slightly less bias in the proportion unsaturated estimates. There is also a potentially concerning extreme prediction in the observed _vs_ predicted PPC.
```{r ppc_sat_check_mod_n2o5, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")
grid.arrange(
df_model %>%
  add_predicted_draws(n2o_mod5, 
                      ndraws = 50) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  ggplot(aes(x = sat_ratio)) +
  geom_density(aes(x = sat_pred, group = .draw), 
               n = 1024, 
               adjust = 1,
               color = "lightblue") +
  geom_density(n = 1024, adjust = 2) +
  xlim(0, 5) +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod5, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(usat_pred = ifelse(exp(logn2o) < exp(logn2oeq), 1, 0)) %>%
  group_by(.draw) %>%
  summarise(prop_pred = sum(usat_pred) / 984) %>%
  ggplot(aes(x = prop_pred)) +
  geom_histogram(binwidth = 0.001, fill = "lightblue") +
  geom_vline(data = df_model, mapping = aes(xintercept = sum(n2o < n2o_eq)/984)) +
  xlab("Proportion undersaturated waterbodies") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod5, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(mean_yrep = mean(sat_pred),
         sd_yrep = sd(sat_pred)) %>% 
  ungroup() %>%
  mutate(mean_y = mean(sat_ratio),
         sd_y = sd(sat_ratio)) %>%
  ggplot(aes(x = mean_yrep, y = sd_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = mean_y), linetype = "dashed") +
  geom_hline(aes(yintercept = sd_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod5, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(min_yrep = min(sat_pred),
         max_yrep = max(sat_pred)) %>% 
  ungroup() %>%
  mutate(min_y = min(sat_ratio),
         max_y = max(sat_ratio)) %>%
  ggplot(aes(x = min_yrep, y = max_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = min_y), linetype = "dashed") +
  geom_hline(aes(yintercept = max_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod5, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.row) %>%
  mutate(mean_yrep = mean(sat_pred)) %>% 
  filter(.draw == 1) %>% 
  ggplot(aes(x = mean_yrep, y = sat_ratio)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
, ncol = 2)
```

#### R-square
```{r r2_mod_n2o5, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=4, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")
round(bayes_R2(n2o_mod5, resp = "logn2o", cores = 1), 3) 
round(bayes_R2(n2o_mod5, resp = "logn2oeq", cores = 1), 3)
```

### Covariate effects
#### N2O
The conditional effects plot for the covariate effects N2O suggested a similar effect of NO3, but interesting interactions between NO3 and lake area and NO3 and surface temperature. For lake area, the effect was estimated to be larger and more negative at the highest levels of NO3; and slightly negative at the lowest level of NO3. For surface temperature, the effect was estimated to be largest and positive at the highest level of NO3; and negative at the lowest level of NO3.
```{r conditional_effects_mod_n2o5, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")

p1 <- conditional_effects(n2o_mod5, 
                          resp = "logn2o", 
                          effects = c("no3_cat"), 
                          plot = F)

p2 <- conditional_effects(n2o_mod5, 
                          resp = "logn2o", 
                          effects = c("log_area:no3_cat"), 
                          plot = F)
p3 <- conditional_effects(n2o_mod5, 
                          resp = "logn2o", 
                          effects = c("surftemp:no3_cat"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]], 
                 plot(p2, plot = F)[[1]],
                 plot(p3, plot = F)[[1]])

rm(p1, p2, p3)

annotate_figure(plt, top = text_grob(expression(paste("N2O: covariate effects on ", mu)), 
               color = "black", face = "bold", size = 14))
```

The estimated covariate effects on $\sigma$ suggested a negative relationship with log(area) and a positive relationship, again, with NO3.
```{r conditional_effects_sigma_n2o5, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")

p1 <- conditional_effects(n2o_mod5, 
                          resp = "logn2o",
                          dpar = "sigma",
                          effects = c("log_area"),
                          plot = F)

p2 <- conditional_effects(n2o_mod5, 
                          resp = "logn2o",
                          dpar = "sigma",
                          effects = c("no3_cat"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, 
                top = text_grob(expression(paste("N2O: covariate effects on ", sigma)),
                                color = "black",
                                face = "bold",
                                size = 14))
```

#### Equilibrium N2O
The estimated covariate effect on N2O remained largely the same as estimated in the previous model.
```{r conditional_effects_mod_n2oeq5, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")

p1 <- conditional_effects(n2o_mod5, 
                          resp = "logn2oeq", 
                          effects = c("surftemp:log_elev"), 
                          plot = F)

p2 <- conditional_effects(n2o_mod5, 
                          resp = "logn2oeq", 
                          effects = c("log_elev:surftemp"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, top = text_grob(expression(paste("Equilibrium N2O: covariate effects on ", mu)), 
               color = "black", face = "bold", size = 14))
```

```{r conditional_effects_sigma_n2oeq5, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")

p1 <- conditional_effects(n2o_mod5, 
                          resp = "logn2oeq",
                          dpar = "sigma",
                          effects = c("surftemp"), 
                          plot = F)

p2 <- conditional_effects(n2o_mod5, 
                          resp = "logn2oeq",
                          dpar = "sigma",
                          effects = c("log_elev"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, top = text_grob(expression(paste("Equilibrium N2O: covariate effects on ", sigma)), 
               color = "black", face = "bold", size = 14))
```

## A Final Model
As demonstrated above, models excluding the NO3 covariate consistently resulted in poorer fits to to the observed dissolved N2O data. Including surface temperature and elevation in the equilibrium N2O part of the model resulted in substantially improved replication of key aspects of the observed data. Likewise, added flexibility in the distributional terms for both dissolved and equilibrium N2O led to improvements. 

To make inferences from this model for N2O in the population of interest, however, the included covariates needed to be (1) fully observed across that population or (2) their missingness needed to be modeled. For the lake area and elevation covariates, data _was_ available for all lakes from previously compiled geospatial databases. However, neither surface temperature or NO3 were observed for lakes outside of the sample. They were partially observed with respect to the target population. Their missingness needed to be accounted for in a model. Therefore, a more complex model was constructed below that included surface temperature and NO3 as additional responses conditioned on the survey design variables and fully observed covariates. This approach to inference for N2O was similar to a Bayesian structural equation model [@Merkle_etal_2021; @Merkle_Rosseel_2018]. The main details of the logical dependence structure could be characterized as:

$$\begin{align} 
\color{#1F449C}{\boldsymbol{N_2O_{diss}}} &=\sim Survey + Area + \color{#F05039}{\boldsymbol{NO_3}} + \color{#EEBAB4}{\boldsymbol{Temp}} \\ 
\color{#A8B6CC}{\boldsymbol{N_2O_{equil}}} &=\sim Survey + Elev  + \color{#EEBAB4}{\boldsymbol{Temp}}\\
\color{#F05039}{\boldsymbol{NO_3}} &=\sim Survey + Area + \color{#EEBAB4}{\boldsymbol{Temp}} \\ 
\color{#EEBAB4}{\boldsymbol{Temp}} &=\sim Survey + Lat + Elev + Day
\end{align}$$

Variables in color text above were treated as partially observed with respect to the population of interest (i.e., observed only in the sample), whereas variables in black text were considered fully observed. The partially observed variables, being dissolved and equilibrium N2O, NO3, and surface temperature, were each modeled conditional on the survey design variables and other partially and/or fully observed covariates. This structural equation approach requires a more complex set of post-processing steps compared to a typical MRP analysis. In order to propagate estimates and uncertainty through the dependency structure and make inferences, the fitted model was used to first predict surface temperature in the target population, since it depended only on the fully observed covariates. That predictive distribution was then used alongside the relevant fully observed covariates to predict NO3 in the target population. Finally, the predictive distributions for termperature and NO3 were used to predict the N2O responses. These steps were carried out in the "Predict to population" section to follow.

In the final model below, the submodel for surface temperature assumed a Gamma distributed error distribution and the linear predictor included the survey design variables, latitude, elevation, and julian date. The shape parameter was also modeled as a function of latitude to address increasing response variance along the latitudinal gradient. The NO3 submodel was a cumulative logit formulation and the linear predictor included all of the survey factors as well as surface temperature and lake area. 

The N2O and N2O-eq responses were each modeled with Gamma distributed errors, but with the same covariate structure as in model 5. The same structure was also employed for the shape terms in these responses, corresponding to the $\sigma$ terms in the previous model. Though not shown in this document, the Gamma error structure appeared to result in slightly better performance in the predictive checks compared to the Gaussian errors in previous models. This was primarily apparent in the saturation ratio checks, which may have been more sensitive to model performance in the tails of the N2O responses. Others have also indicated that the Gamma error distribution can work well for dissolved N2O data [@Webb_etal_2019].

Note that there was no residual correlation term for this model, since the residuals are undefined for the Gamma and cumulative logit models. Dropping the observation-level residual correlation term was deemed a reasonable compromise that enabled modeling the missingness of NO3, in particular. Nevertheless, the random intercepts again allowed for potential correlations between responses at the group levels. 

```{r n2o_mod_mv_6, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

bf_n2o <- bf(n2o ~ mo(no3_cat) +
               log_area +
               surftemp + 
               mo(no3_cat):log_area +
               mo(no3_cat):surftemp +
               (mo(no3_cat) | a | WSA9) + 
               (mo(no3_cat) | b | WSA9:state) + 
               (1 | c | WSA9:state:size_cat),
             shape ~ log_area +
               mo(no3_cat) +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat),
             family = Gamma(link = "log"))

bf_n2oeq <- bf(n2o_eq ~ surftemp +
                 log_elev +
                 surftemp:log_elev +
                 (1 | a | WSA9) + 
                 (1 | b | WSA9:state) +
                 (1 | c | WSA9:state:size_cat),
             shape ~ surftemp +
               log_elev +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat),
             family = Gamma(link = "log"))

bf_temp <- bf(surftemp ~ lat +
                s(log_elev) +
                s(jdate) +
                (1 | a | WSA9) + 
                (1 | b | WSA9:state) +
                (1 | c | WSA9:state:size_cat),
              shape ~ lat,
              family = Gamma(link = "log"))

bf_no3 <- bf(no3_cat ~ surftemp +
               log_area +
               (1 | a | WSA9) +
               (1 | b | WSA9:state) +
               (1 | c | WSA9:state:size_cat),
             family = cumulative(link = "logit", threshold="flexible"))

priors <- c(
  prior(normal(2, 1), class = "Intercept", resp = "n2o"),
  prior(normal(0, 1), class = "b", resp = "n2o"),
  prior(exponential(2), class = "sd", resp = "n2o"),
  prior(normal(5, 4), class = "Intercept", dpar = "shape", resp = "n2o"),
  prior(normal(0, 1), class = "b", dpar = "shape", resp = "n2o"),
  prior(exponential(2), class = "sd", dpar = "shape", resp = "n2o"),
  
  prior(normal(2, 1), class = "Intercept", resp = "n2oeq"), 
  prior(normal(0, 1), class = "b", resp = "n2oeq"),  
  prior(exponential(2), class = "sd", resp = "n2oeq"),
  prior(normal(5, 4), class = "Intercept", dpar = "shape", resp = "n2oeq"),
  prior(normal(0, 1), class = "b", dpar = "shape", resp = "n2oeq"),
  prior(exponential(2), class = "sd", dpar = "shape", resp = "n2oeq"),
  
  prior(normal(3, 1), class = "Intercept", resp = "surftemp"), 
  prior(normal(0, 1), class = "b", resp = "surftemp"), 
  prior(exponential(0.5), class = "sds", resp = "surftemp"),
  prior(exponential(2), class = "sd", resp = "surftemp"),
  prior(normal(5, 4), class = "Intercept", dpar = "shape", resp = "surftemp"),
  prior(normal(0, 1), class = "b", dpar = "shape", resp = "surftemp"),
  
  prior(normal(0, 3), class = "Intercept", resp = "no3cat"),
  prior(normal(0, 1), class = "b", resp = "no3cat"),
  prior(exponential(1), class = "sd", resp = "no3cat"),
  
  prior(lkj(2), class = "cor")
  )

n2o_mod6 <- brm(bf_n2o + 
                  bf_n2oeq + 
                  bf_temp + 
                  bf_no3 + 
                  set_rescor(rescor = FALSE),
                data = df_model, 
                prior = priors,
  control = list(adapt_delta = 0.975, max_treedepth = 14),
  #sample_prior = "only",
  save_pars = save_pars(all = TRUE),
  seed = 85132,#14548,
  #init = my_inits,
  init_r = 0.5,
  chains=4, 
  iter=5000, 
  cores=4)

save(n2o_mod6, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
```

### Summarize fit
Below is a summary of the fitted parameters and MCMC diagnostics. 
```{r print_mod6, echo=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")

print(n2o_mod6, digits = 3, prior = T)
```

### Model checks
Below, the same PPCs for N2O and N2O-eq were employed as before.
#### N2O PPC
The PPCs for N2O from this model were similarly reasonable as for models 4 and 5 above.
```{r ppc_full_checks_mod_n2o6, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
grid.arrange(
pp_check(n2o_mod6, 
         resp = "n2o",
         type = "dens_overlay",
         nsamples = 40,
         cores = 1) + 
  theme_tidybayes() +
  scale_x_continuous(trans = "log") +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("density")
,
pp_check(n2o_mod6, 
         resp = "n2o",
         type = "ecdf_overlay",
         nsamples = 40, 
         cores = 1) + 
  theme_tidybayes() +
  scale_x_continuous(trans = "log") +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod6, 
         resp = "n2o",
         type = "stat_2d", 
         stat = c("mean", "sd"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod6, 
         resp = "n2o",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod6, 
         resp = "n2o",
         type = "stat_2d", 
         stat = c("min", "max"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod6, 
         resp = "n2o",
         type = "scatter_avg", 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
, ncol = 2)
```

#### Equilibrium N2O PPC
Again, the PPCs for N2O-eq in this model were similar to those for models 4 and 5.
```{r ppc_full_checks_mod_n2oeq6, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
grid.arrange(
pp_check(n2o_mod6, 
         resp = "n2oeq",
         type = "dens_overlay",
         nsamples = 40,
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O) concentration"))) + 
  ylab("density")
,
pp_check(n2o_mod6, 
         resp = "n2oeq",
         type = "ecdf_overlay",
         nsamples = 40, 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O) concentration"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod6, 
         resp = "n2oeq",
         type = "stat_2d", 
         stat = c("mean", "sd"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod6, 
         resp = "n2oeq",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod6, 
         resp = "n2oeq",
         type = "stat_2d", 
         stat = c("min", "max"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod6, 
         resp = "n2oeq",
         type = "scatter_avg", 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
, ncol = 2)
```

#### Bivariate PPC
This model again provided a very reasonable representation of the bivariate relationship between N2O and N2O-eq (below).
```{r ppc_bv_check_mod_n2o6, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
df_model %>%
  add_predicted_draws(n2o_mod6,
                      resp = c("n2o","n2oeq"),
                      ndraws = 50) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  ggplot(aes(x = log(n2o_obs), y = log(n2oeq_obs))) +
  geom_density_2d(aes(x = log(n2o), 
                      y = log(n2oeq), 
                      group = .draw),
                  bins = 10,
                  color = "lightblue", 
                  alpha = 0.4) +
  geom_density_2d(color = "black", bins = 10) +
  xlim(1.25, 2.75) +
  ylim(1.25, 2.75) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
```

#### Saturation PPC
The saturation ratio PPCs below show similar behavior as with models 4 and 5 above, but with perhaps slightly less bias in the predictions for the proportion of undersaturated waterbodies and fewer extreme predictions for the means and standard deviations. The observed _vs._ predicted PPC also appears to have a better behaved variance and no extreme predictions, compared to models 4 and 5 with the lognormal errors.
```{r ppc_sat_check_mod_n2o6, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
grid.arrange(
df_model %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"),
                      ndraws = 50) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = n2o / n2oeq) %>%
  ggplot(aes(x = sat_ratio)) +
  geom_density(aes(x = sat_pred, group = .draw), 
               n = 1024, 
               adjust = 1,
               color = "lightblue") +
  geom_density(n = 1024, adjust = 2) +
  xlim(0, 5) +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"), 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(usat_pred = ifelse(n2o < n2oeq, 1, 0)) %>%
  group_by(.draw) %>%
  summarise(prop_pred = sum(usat_pred) / 984) %>%
  ggplot(aes(x = prop_pred)) +
  geom_histogram(binwidth = 0.001, fill = "lightblue") +
  geom_vline(data = df_model, mapping = aes(xintercept = sum(n2o < n2o_eq)/984)) +
  xlab("Proportion undersaturated waterbodies") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"), 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = n2o / n2oeq) %>%
  group_by(.draw) %>%
  mutate(mean_yrep = mean(sat_pred),
         sd_yrep = sd(sat_pred)) %>% 
  ungroup() %>%
  mutate(mean_y = mean(sat_ratio),
         sd_y = sd(sat_ratio)) %>%
  ggplot(aes(x = mean_yrep, y = sd_yrep)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_vline(aes(xintercept = mean_y), linetype = "dashed") +
  geom_hline(aes(yintercept = sd_y), linetype = "dashed") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"), 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = n2o / n2oeq) %>%
  group_by(.draw) %>%
  mutate(min_yrep = min(sat_pred),
         max_yrep = max(sat_pred)) %>% 
  ungroup() %>%
  mutate(min_y = min(sat_ratio),
         max_y = max(sat_ratio)) %>%
  ggplot(aes(x = min_yrep, y = max_yrep)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_vline(aes(xintercept = min_y), linetype = "dashed") +
  geom_hline(aes(yintercept = max_y), linetype = "dashed") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"), 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = n2o / n2oeq) %>%
  group_by(.row) %>%
  mutate(mean_yrep = mean(sat_pred)) %>% 
  filter(.draw == 1) %>% 
  ggplot(aes(x = mean_yrep, y = sat_ratio)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
, ncol = 2)
```

The plot below shows the same PPC, but for the "test" or second-vist data. Overall, the model looked to perform similarly as with the data used to fit it.
```{r ppc_sat_check_testdata_mod_n2o6, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_test.rda")
grid.arrange(
df_test %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"),
                      ndraws = 50) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = n2o / n2oeq) %>%
  ggplot(aes(x = sat_ratio)) +
  geom_density(aes(x = sat_pred, group = .draw), 
               n = 1024, 
               adjust = 1,
               color = "lightblue") +
  geom_density(n = 1024, adjust = 2) +
  xlim(0, 5) +
  theme_tidybayes()
,
df_test %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"), 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(usat_pred = ifelse(n2o < n2oeq, 1, 0)) %>%
  group_by(.draw) %>%
  summarise(prop_pred = sum(usat_pred) / 95) %>%
  ggplot(aes(x = prop_pred)) +
  geom_histogram(binwidth = 0.001, fill = "lightblue") +
  geom_vline(data = df_model, mapping = aes(xintercept = sum(n2o < n2o_eq)/984)) +
  xlab("Proportion undersaturated waterbodies") +
  theme_tidybayes()
,
df_test %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"), 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = n2o / n2oeq) %>%
  group_by(.draw) %>%
  mutate(mean_yrep = mean(sat_pred),
         sd_yrep = sd(sat_pred)) %>% 
  ungroup() %>%
  mutate(mean_y = mean(sat_ratio),
         sd_y = sd(sat_ratio)) %>%
  ggplot(aes(x = mean_yrep, y = sd_yrep)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_vline(aes(xintercept = mean_y), linetype = "dashed") +
  geom_hline(aes(yintercept = sd_y), linetype = "dashed") +
  theme_tidybayes()
,
df_test %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"), 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = n2o / n2oeq) %>%
  group_by(.draw) %>%
  mutate(min_yrep = min(sat_pred),
         max_yrep = max(sat_pred)) %>% 
  ungroup() %>%
  mutate(min_y = min(sat_ratio),
         max_y = max(sat_ratio)) %>%
  ggplot(aes(x = min_yrep, y = max_yrep)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_vline(aes(xintercept = min_y), linetype = "dashed") +
  geom_hline(aes(yintercept = max_y), linetype = "dashed") +
  theme_tidybayes()
,
df_test %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"), 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = n2o / n2oeq) %>%
  group_by(.row) %>%
  mutate(mean_yrep = mean(sat_pred)) %>% 
  filter(.draw == 1) %>% 
  ggplot(aes(x = mean_yrep, y = sat_ratio)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
, ncol = 2)
```

#### R-square
Below are estimates for the Bayesian $R^2$, which were largely similar for N2O and N2O-eq as with models 4 and 5 above. The $R^2$ for the surface temperature response also suggested a fairly good fit.
```{r r2_mod_n2o6, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=4, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
round(bayes_R2(n2o_mod6, resp = "n2o"), 3) 
round(bayes_R2(n2o_mod6, resp = "n2oeq"), 3)
round(bayes_R2(n2o_mod6, resp = "surftemp"), 3)
```

Below are the same $R^2$ estimates, but for the second-visit data. That these estimates are similar to those for the data used to fit the model, suggesting that the model may perform similarly well out-of-sample.
```{r r2_test_mod_n2o6, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=4, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_test.rda")
round(bayes_R2(n2o_mod6, resp = "n2o", newdata = df_test), 3) 
round(bayes_R2(n2o_mod6, resp = "n2oeq", newdata = df_test), 3)
round(bayes_R2(n2o_mod6, resp = "surftemp", newdata = df_test), 3)
```

### Covariate effects
#### N2O
The conditional effects plot for the covariate effects N2O suggested a similar effect of NO3, but interesting interactions between NO3 and lake area and NO3 and surface temperature. For lake area, the effect was estimated to be larger and more negative at the highest levels of NO3; and slightly negative at the lowest level of NO3. For surface temperature, the effect was estimated to be largest and positive at the highest level of NO3; and negative at the lowest level of NO3.
```{r conditional_effects_mod_n2oF, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")

p1 <- conditional_effects(n2o_mod6, 
                          resp = "n2o", 
                          effects = c("no3_cat"), 
                          plot = F)

p2 <- conditional_effects(n2o_mod6, 
                          resp = "n2o", 
                          effects = c("log_area:no3_cat"), 
                          plot = F)
p3 <- conditional_effects(n2o_mod6, 
                          resp = "n2o", 
                          effects = c("surftemp:no3_cat"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]], 
                 plot(p2, plot = F)[[1]],
                 plot(p3, plot = F)[[1]])

rm(p1, p2, p3)

annotate_figure(plt, top = text_grob(expression(paste("N2O: covariate effects on ", mu)), 
               color = "black", face = "bold", size = 14))
```

The estimated covariate effects on $\sigma$ suggested a negative relationship with log(area) and a positive relationship, again, with NO3.
```{r conditional_effects_sigma_n2oF, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")

p1 <- conditional_effects(n2o_mod6, 
                          resp = "n2o",
                          dpar = "shape",
                          effects = c("log_area"),
                          plot = F)

p2 <- conditional_effects(n2o_mod6, 
                          resp = "n2o",
                          dpar = "shape",
                          effects = c("no3_cat"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, 
                top = text_grob(expression(paste("N2O: covariate effects on ", shape)),
                                color = "black",
                                face = "bold",
                                size = 14))
```

#### Equilibrium N2O
The estimated covariate effect on N2O remained largely the same as estimated in the previous model.
```{r conditional_effects_mod_n2oeqF, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")

p1 <- conditional_effects(n2o_mod6, 
                          resp = "n2oeq", 
                          effects = c("surftemp:log_elev"), 
                          plot = F)

p2 <- conditional_effects(n2o_mod6, 
                          resp = "n2oeq", 
                          effects = c("log_elev:surftemp"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, top = text_grob(expression(paste("Equilibrium N2O: covariate effects on ", mu)), 
               color = "black", face = "bold", size = 14))
```

```{r conditional_effects_sigma_n2oeqF, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")

p1 <- conditional_effects(n2o_mod6, 
                          resp = "n2oeq",
                          dpar = "shape",
                          effects = c("surftemp"), 
                          plot = F)

p2 <- conditional_effects(n2o_mod6, 
                          resp = "n2oeq",
                          dpar = "shape",
                          effects = c("log_elev"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, top = text_grob(expression(paste("Equilibrium N2O: covariate effects on ", shape)), 
               color = "black", face = "bold", size = 14))
```

# Predict to population
As previously described, in order to make inferences to the population of interest, the final model above was used to, first, predict surface temperature in the target population, since it depended only on the fully observed covariates. Next, the predictive distribution for surface temperature was used, along with the relevant fully observed covariates, to predict NO3 in the target population. Finally, the predictive distributions for temperature and NO3 were used to predict the N2O responses. The code for these steps is outlined in the following.

The first step used the final model to predict to the population:
```{r predict_obsframe, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")

predict_temp <- sframe %>%
  mutate(jdate = 205) %>%
  add_predicted_draws(n2o_mod6, resp=c("surftemp"), 
                      allow_new_levels = TRUE, 
                      cores =1, 
                      ndraws = 500) %>%
  mutate(surftemp = .prediction)

save(predict_temp, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_temp.rda")
```

NO3 was next predicted. Note that the posterior predictive distribution for NO3 was subsampled in order to minimize excess simulations
```{r parallel_predict_draws, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_temp.rda")

temp_X <- predict_temp %>% # select relevant columns as predictors
  ungroup() %>%
  select(WSA9,
         state,
         size_cat,
         log_area,
         .row,
         .draw,
         surftemp) %>%
  select(WSA9, state, size_cat, log_area, surftemp)


rm(predict_temp) # reduce memory
gc()

# set number of cores to use for parallel predictions
# and register the workers
cl <- parallel::makeCluster(5) 
doSNOW::registerDoSNOW(cl) 

# make a progress bar
pb <- txtProgressBar(max = 1500, style = 3)
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress = progress)

system.time( # approx 26 hrs with 5 workers & 500 draws from PPD
predict_no3 <- foreach(sub_X = isplitRows(temp_X, chunkSize = 155299), 
                       .combine = 'c',
                       .packages = c("brms"),
                       .options.snow = opts
                       ) %dopar% {
                         apply(brms::posterior_predict(n2o_mod6,
                                                 newdata = sub_X,
                                                 resp = "no3cat",
                                                 allow_new_levels = T,
                                                 ndraws = 500,
                                                 cores = 1), 2, sample, 1)
                         }
)


close(pb)
parallel::stopCluster(cl)

save(predict_no3, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_no3.rda")
```

Finally, N2O and N2O-eq were predicted using the surface temperature and nitrate predictions along with the survey variables and known covariates. Again, the posterior was subsampled in order to reduce excess simulations.
```{r n2o_covariates_X, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_no3.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_temp.rda")

# Assemble dataframe containing relevant covariates (known and predicted)
n2o_X <- predict_temp %>%
  ungroup() %>%
  mutate(no3_cat = predict_no3) %>%
  select(WSA9,
         state,
         size_cat,
         log_area,
         surftemp,
         log_elev,
         no3_cat)

# clear objects to reduce memory overhead
rm(predict_no3, predict_temp) 
gc()

# save the predictors for n2o and n2oeq
save(n2o_X, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_X.rda")
```

```{r parallel_predict, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_X.rda")

# set number of cores to use for parallel predictions
# and register the workers
cl <- parallel::makeCluster(6) 
doSNOW::registerDoSNOW(cl) 

# make a progress bar
pb <- txtProgressBar(max = 1500, style = 3)
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress = progress)

# make predictions in parallel
system.time(
predict_n2o <- foreach(sub_X = isplitRows(n2o_X, chunkSize = 155299),
                 .combine = rbind,
                 .options.snow = opts,
                 .packages = c("brms")) %dopar% {
  apply(posterior_predict(n2o_mod6,
                          newdata = sub_X,
                          resp = c("n2o", "n2oeq"),
                          allow_new_levels = T,
                          ndraws = 500,
                          cores = 1),
        2, sample, 1)
                   }
)

close(pb)
parallel::stopCluster(cl)

colnames(predict_n2o) <- c("n2o", "n2oeq")

save(predict_n2o, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_n2o.rda")
```

Finally, the predictions for all four partially observed responses were assembled into a new dataframe for use in inference.
```{r assemble_predictions, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_n2o.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_no3.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_temp.rda")

all_predictions <- predict_temp %>%
  ungroup() %>%
  mutate(no3cat = predict_no3) %>%
  bind_cols(predict_n2o) %>%
  mutate(n2osat = n2o / n2oeq, # calculate saturation ratio
         .row = rep(1:465897, each = 500),
         .draw = rep(seq(1,500, 1), 465897)) %>%
  mutate(area_ha = exp(log_area)) %>% # include area on ha scale
  select(WSA9,
         state,
         size_cat,
         area_ha,
         lat,
         lon,
         .row,
         .draw,
         surftemp,
         no3cat,
         n2o,
         n2oeq,
         n2osat)

rm(predict_n2o, predict_temp, predict_no3) # clean up workspace for RAM
gc()
 

save(all_predictions, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
```

# Population estimates
A number of estimates for the target population were assembled and presented below. First, the full posterior predictive distributions for dissolved N2O, equilibrium N2o, and the saturation ratio were assessed. These distributions summarized the predicted distribution of concentrations or ratios for all lakes in the population of interest and included parameter uncertainty propagated through the model. Next, population means were assessed, followed by comparisons of some model-based estimates to previously calculated design-based estimates.

## Posterior predictive distributions
Below, a density plot summarized the posterior predictive distribution of N2O and N2O-eq concentrations across the target population of lakes, based on 500 draws from the posterior predictive distribution. Note that the x-axis was truncated at 50 nmol/L for a clearer visualization of the bulk of the predicted distribution. For reference, the max predicted value was 4403.2 nmol/L for dissolved N2O, 20.4 nmol/L for dissolved N2O, and 793.5 for the saturation ratio. 
```{r plot_n2o_posterior_preds, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>% 
  slice_sample(n=1e4) %>% # simple random sample 10k lakes
  ggplot(aes(x = n2o)) +
  stat_dist_slabinterval(.width = c(0.5, 0.95)) +
  scale_y_log10() +
  xlim(0, 50) +
  xlab("Dissolved N2O concentration") +
  ylab("density") +
  theme_tidybayes()
```

```{r plot_n2oeq_posterior_preds, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>% 
  slice_sample(n=1e4) %>% # simple random sample 10k lakes
  ggplot(aes(x = n2o)) +
  stat_dist_slabinterval(.width = c(0.5, 0.95)) +
  scale_y_log10() +
  xlim(0, 50) +
  xlab("Equilibrium N2O concentration") +
  ylab("density") +
  theme_tidybayes()
```

```{r plot_sat_posterior_preds, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>% 
  slice_sample(n=1e4) %>% # simple random sample 10k lakes
  ggplot(aes(x = n2osat)) +
  stat_dist_slabinterval(.width = c(0.5, 0.95)) +
  scale_y_log10() +
  xlim(0, 8) +
  geom_vline(xintercept = 1, linetype = "dashed") +
  xlab("N2O saturation ratio") +
  ylab("density") +
  theme_tidybayes()
```

## Estimated means
### National
Below are density plots summarizing the posterior distribution of _means_ for N2O concentrations and the saturation ratio for the target population (i.e., all US lakes > 1ha in the lower 48 states).
```{r plot_n2o_nat_posterior_mean, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>% 
  summarise(mean_n2o = mean(n2o)) %>%
  ggplot(aes(x = mean_n2o)) + 
  stat_dist_slabinterval(.width = c(0.5, 0.95)) +
  xlab("mean dissolved N2O") +
  ylab("density") +
  theme_tidybayes()
```

```{r plot_n2oeq_nat_posterior_mean, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>% 
  summarise(mean_n2o = mean(n2oeq)) %>%
  ggplot(aes(x = mean_n2o)) + 
  stat_dist_slabinterval(.width = c(0.5, 0.95)) +
  xlab("mean equlilibrium N2O") +
  ylab("density") +
  theme_tidybayes()
```

```{r plot_sat_nat_posterior_mean, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>% 
  summarise(mean_sat = mean(n2o/n2oeq)) %>%
  ggplot(aes(x = mean_sat)) + 
  stat_dist_slabinterval(.width = c(0.5, 0.95)) +
  xlab("mean N2O saturation ratio") +
  ylab("density") +
  theme_tidybayes()
```

To illustrate the skewness in the predictive distribution for the saturation ratio, an estimate for the median ratio is shown below. The entire posterior distribution of the mean above is larger than 1, the ratio representing the boundary of under- _vs._ oversaturation. By comparison, the posterior estimate of the median below only included values less than one, suggesting that though the mean saturation ratio was greater than 1, most lakes in the national populaiton were undersaturated (i.e., ratio less than 1). In distributions with right-skew, the mean can often be considerably larger than the median.
```{r plot_sat_nat_posterior_median, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>% 
  summarise(median_sat = median(n2o/n2oeq)) %>%
  ggplot(aes(x = median_sat)) + 
  stat_dist_slabinterval(.width = c(0.5, 0.95)) +
  xlab("median N2O saturation ratio") +
  ylab("density") +
  theme_tidybayes()
```

Below is a plot of the posterior mean estimate for the proportion of unsaturated lakes at the national scale.
```{r plot_undersat_posterior_mean, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>%
  summarise(prop_sat = sum(n2osat < 1) / length(unique(.row))) %>%
  ggplot(aes(x = prop_sat)) + 
  stat_slabinterval(.width = c(0.5, 0.95)) +
  xlab("Proportion of undersaturated waterbodies") +
  ylab("density") +
  theme_tidybayes()
```

### Ecoregion
Below are posterior estimates of the means for dissolved and equilibrium N2O and the saturation ratio by WSA9 ecoregion.
```{r plot_n2o_wsa9_posterior_mean, echo=FALSE, fig.align='center', fig.height=8, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(WSA9, .draw) %>%
  summarise(mean_n2o = mean(n2o), .groups = "drop") %>%
  mutate(ecoregion = WSA9) %>%
  ggplot() + 
  stat_slabinterval(aes(x = mean_n2o, 
                        y = reorder(ecoregion, mean_n2o)), 
                    quantiles = 100,
                    .width = c(0.5, 0.95)) +
  xlab("mean dissolved N2O") +
  ylab("density") +
  theme_tidybayes()
```

```{r plot_n2oeq_wsa9_posterior_mean, echo=FALSE, fig.align='center', fig.height=8, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(WSA9, .draw) %>%
  summarise(mean_n2oeq = mean(n2oeq), .groups = "drop") %>%
  mutate(ecoregion = WSA9) %>%
  ggplot() + 
  stat_slabinterval(aes(x = mean_n2oeq, 
                        y = reorder(ecoregion, mean_n2oeq)), 
                    .width = c(0.5, 0.95)) +
  xlab("mean equilibrium N2O") +
  ylab("density") +
  theme_tidybayes()
```

```{r plot_sat_wsa9_posterior_mean, echo=FALSE, fig.align='center', fig.height=8, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(WSA9, .draw) %>%
  summarise(mean_sat = mean(n2o/n2oeq), .groups = "drop") %>%
  mutate(ecoregion = WSA9) %>%
  ggplot() + 
  stat_slabinterval(aes(x = mean_sat,
                        y = reorder(ecoregion, mean_sat)),
                    .width = c(0.5, 0.95)) +
  xlab("mean N2O saturation ratio") +
  ylab("density") +
  theme_tidybayes()
```

A plot of the posterior estimates for the median saturation ratio below indicated, again, that most lakes in each ecoregion were undersaturated (i.e., median << 1).
```{r plot_sat_wsa9_posterior_median, echo=FALSE, fig.align='center', fig.height=8, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(WSA9, .draw) %>%
  summarise(median_sat = median(n2o/n2oeq), .groups = "drop") %>%
  mutate(ecoregion = WSA9) %>%
  ggplot() + 
  stat_slabinterval(aes(x = median_sat,
                        y = reorder(ecoregion, median_sat)),
                    .width = c(0.5, 0.95)) +
  xlab("median N2O saturation ratio") +
  ylab("density") +
  theme_tidybayes()
```

A plot of the estimates of the proportion of under-saturated lakes by ecoregion is below.
A plot of the posterior estimates for the median saturation ratio below indicated, again, that most lakes in each ecoregion were undersaturated (i.e., median << 1).
```{r plot_prop_sat_wsa9_posterior_median, echo=FALSE, fig.align='center', fig.height=8, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(WSA9, .draw) %>%
  summarise(prop_sat = sum(n2osat < 1) / length(unique(.row)), .groups = "drop") %>%
  mutate(ecoregion = WSA9) %>%
  ggplot() + 
  stat_slabinterval(aes(x = prop_sat,
                        y = reorder(ecoregion, prop_sat)),
                    .width = c(0.5, 0.95)) +
  xlab("Proportion of undersaturated lakes") +
  ylab("density") +
  theme_tidybayes()
```

### State
Comparisons of mean estimates (posterior median, upper and lower 95th percentiles) by state are below. Density estimates were not included to minimize plot space.
```{r plot_state_mean_n2o, echo=FALSE, fig.align='center', fig.height=5, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(state, .draw) %>%
  summarise(mean_n2o = mean(n2o), .groups = "drop") %>%
  group_by(state) %>%
  summarise(estimate = round(median(mean_n2o), 1),
    LCL = round(quantile(mean_n2o, probs = 0.025), 1),
    UCL = round(quantile(mean_n2o, probs = 0.975), 1),
    .groups = "drop") %>% 
  select(state, estimate, LCL, UCL) %>%
  mutate(state = forcats::fct_reorder(state, estimate)) %>%
  ggplot(aes(x = state, y = estimate )) +
  geom_point(position=position_dodge(width=0.5)) +
  geom_linerange(aes(ymin = LCL, ymax = UCL),
                 position=position_dodge(width=0.5)) +
  ylab("mean dissolved N2O") +
  scale_y_continuous(position = "left") + 
  theme_tidybayes() +
  theme(axis.text.x = element_text(size=9, angle=45))
```

```{r plot_state_mean_n2oeq, echo=FALSE, fig.align='center', fig.height=5, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(state, .draw) %>%
  summarise(mean_n2oeq = mean(n2oeq), .groups = "drop") %>%
  group_by(state) %>%
  summarise(estimate = round(median(mean_n2oeq), 1),
    LCL = round(quantile(mean_n2oeq, probs = 0.025), 1),
    UCL = round(quantile(mean_n2oeq, probs = 0.975), 1),
    .groups = "drop") %>% 
  select(state, estimate, LCL, UCL) %>%
  mutate(state = forcats::fct_reorder(state, estimate)) %>%
  ggplot(aes(x = state, y = estimate )) +
  geom_point(position=position_dodge(width=0.5)) +
  geom_linerange(aes(ymin = LCL, ymax = UCL),
                 position=position_dodge(width=0.5)) +
  ylab("mean equilibrium N2O") +
  scale_y_continuous(position = "left") +
  theme_tidybayes() +
  theme(axis.text.x = element_text(size=9, angle=45))
```

Below, a plot of estimates for the mean (black circles) and median (grey circles) saturation ratio by state. A horizontal, dashed, black line is shown at ratio = 1, indicating the boundary for under- _vs._ oversaturation. Only a few states (e.g., NV, DE) had median estimates that were 1 or greater, suggesting that, for most states, most lakes were undersaturated.
```{r plot_state_mean_median_sat, echo=FALSE, fig.align='center', fig.height=5, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(state, .draw) %>%
  summarise(mean_sat = mean(n2osat), 
            median_sat = median(n2osat),
            .groups = "drop") %>%
  group_by(state) %>%
  summarise(estimate_mean = round(median(mean_sat), 4),
    LCL_mean = round(quantile(mean_sat, probs = 0.025), 4),
    UCL_mean = round(quantile(mean_sat, probs = 0.975), 4),
    estimate_median = round(median(median_sat), 4),
    LCL_median = round(quantile(median_sat, probs = 0.025), 4),
    UCL_median = round(quantile(median_sat, probs = 0.975), 4),
    .groups = "drop") %>% 
  select(state, 
         estimate_mean, 
         estimate_median, 
         LCL_mean,
         LCL_median,
         UCL_mean,
         UCL_median) %>%
  mutate(state = forcats::fct_reorder(state, estimate_mean)) %>%
  ggplot(aes(x = state, y = estimate_mean )) +
  geom_point(position=position_dodge(width=0.5),
             size = 2) +
  geom_linerange(aes(ymin = LCL_mean, ymax = UCL_mean),
                 position=position_dodge(width=0.5)) +
  geom_point(aes(x = state, y = estimate_median), 
             position=position_dodge(width=0.5),
             color = "grey",
             size = 2) +
  geom_linerange(aes(ymin = LCL_median, ymax = UCL_median),
                 position=position_dodge(width=0.5),
                 color = "gray") +
  ylab("mean and median N2O saturation ratio") +
  scale_y_continuous(position = "left") +
  geom_hline(yintercept = 1, color = "black", linetype = "dashed") +
  theme_tidybayes() +
  theme(axis.text.x = element_text(size=9, angle=45))
```

Finally, a plot of the estimated proportion of undersaturated lakes for each state in the target population. Point estimates are the posterior median of the proportion and bars are the upper and lower boundaries of the central 95th percentile of the posterior distributions of proportions.
```{r plot_state_prop_sat, echo=FALSE, fig.align='center', fig.height=5, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(state, .draw) %>%
  summarise(prop_sat = sum(n2osat < 1) / length(unique(.row)),
            .groups = "drop") %>%
  group_by(state) %>%
  summarise(estimate = round(median(prop_sat), 4),
    LCL = round(quantile(prop_sat, probs = 0.025), 4),
    UCL = round(quantile(prop_sat, probs = 0.975), 4),
    .groups = "drop") %>% 
  select(state, 
         estimate, 
         LCL,
         UCL) %>%
  mutate(state = forcats::fct_reorder(state, estimate)) %>%
  ggplot(aes(x = state, y = estimate)) +
  geom_point(position=position_dodge(width=0.5),
             size = 2) +
  geom_linerange(aes(ymin = LCL, ymax = UCL),
                 position=position_dodge(width=0.5)) +
  ylab("Proportion of undersaturated lakes") +
  scale_y_continuous(position = "left") +
  theme_tidybayes() +
  theme(axis.text.x = element_text(size=9, angle=45))
```

### Size category
The estimated means by size category are below for dissolved and equilibrium N2O and the saturation ratio. Median estimates for the saturation ratio are also shown.
```{r plot_n2o_size_posterior_mean, echo=FALSE, fig.align='center', fig.height=6, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(mean_n2o = mean(n2o),
            .groups = "drop") %>%
  ggplot(aes(x = mean_n2o, y = size_cat)) + 
  stat_slabinterval(.width = c(0.5, 0.95)) +
  xlab("mean dissolved N2O") +
  ylab("density") +
  theme_tidybayes()
```

```{r plot_n2oeq_size_posterior_mean, echo=FALSE, fig.align='center', fig.height=6, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(mean_n2o = mean(n2oeq),
            .groups = "drop") %>%
  ggplot(aes(x = mean_n2o, y = size_cat)) + 
  stat_slabinterval(.width = c(0.5, 0.95)) +
  xlab("mean equilibrium N2O") +
  ylab("density") +
  theme_tidybayes()
```

```{r plot_sat_size_posterior_mean, echo=FALSE, fig.align='center', fig.height=6, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(mean_n2o = mean(n2osat),
            .groups = "drop") %>%
  ggplot(aes(x = mean_n2o, y = size_cat)) + 
  stat_slabinterval(.width = c(0.5, 0.95)) +
  geom_vline(xintercept = 1, linetype = "dashed", color = "black") +
  xlab("mean N2O saturation ratio") +
  ylab("density") +
  theme_tidybayes()
```

```{r plot_sat_size_posterior_median, echo=FALSE, fig.align='center', fig.height=6, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(mean_n2o = median(n2osat),
            .groups = "drop") %>%
  ggplot(aes(x = mean_n2o, y = size_cat)) + 
  stat_slabinterval(.width = c(0.5, 0.95)) +
  geom_vline(xintercept = 1, linetype = "dashed", color = "black") +
  xlab("median N2O saturation ratio") +
  ylab("density") +
  theme_tidybayes()
```

Mean _vs._ median below.
```{r plot_size_cat_mean_median_sat, echo=FALSE, fig.align='center', fig.height=5, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(mean_sat = mean(n2osat), 
            median_sat = median(n2osat),
            .groups = "drop") %>%
  group_by(size_cat) %>%
  summarise(estimate_mean = round(median(mean_sat), 2),
    LCL_mean = round(quantile(mean_sat, probs = 0.025), 2),
    UCL_mean = round(quantile(mean_sat, probs = 0.975), 2),
    estimate_median = round(median(median_sat), 2),
    LCL_median = round(quantile(median_sat, probs = 0.025), 2),
    UCL_median = round(quantile(median_sat, probs = 0.975), 2),
    .groups = "drop") %>% 
  select(size_cat, 
         estimate_mean, 
         estimate_median, 
         LCL_mean,
         LCL_median,
         UCL_mean,
         UCL_median) %>%
  mutate(state = forcats::fct_reorder(size_cat, estimate_mean)) %>%
  ggplot(aes(x = size_cat, y = estimate_mean )) +
  geom_point(position=position_dodge(width=0.5),
             size = 2) +
  geom_linerange(aes(ymin = LCL_mean, ymax = UCL_mean),
                 position=position_dodge(width=0.5)) +
  geom_point(aes(x = size_cat, y = estimate_median), 
             position=position_dodge(width=0.5),
             color = "grey",
             size = 2) +
  geom_linerange(aes(ymin = LCL_median, ymax = UCL_median),
                 position=position_dodge(width=0.5),
                 color = "gray") +
  ylab("mean and median N2O saturation ratio") +
  scale_y_continuous(position = "left") +
  geom_hline(yintercept = 1, color = "black", linetype = "dashed") +
  theme_tidybayes() +
  theme(axis.text.x = element_text(size=9, angle=45))
```

And, finally, the estimated proportion of undersaturated lakes in the target population by size category
```{r plot_prop_sat_size_cat_posterior_median, echo=FALSE, fig.align='center', fig.height=6, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(prop_sat = sum(n2osat < 1) / length(unique(.row)), .groups = "drop") %>%
  ggplot() + 
  stat_slabinterval(aes(x = prop_sat,
                        y = reorder(size_cat, prop_sat)),
                    .width = c(0.5, 0.95)) +
  xlab("Proportion of undersaturated lakes") +
  ylab("density") +
  theme_tidybayes()
```

## Model- _vs._ design-based
Below, estimates from the model-based approach are compared to design-based estimates. In general, the model estimates were similar to the design-based estimates. Model estimates were typically within the confidence bounds of the design-based estimates, but with much greater precision. Improved precision was expected due to the "shrinkage" induced by the multilevel parameterization, which allowed some "borrowing" of information across the various levels of the survey factors. 

### Dissolved N2O
Below, National mean estimates for dissolved N2O from the model and design-based approaches were compared. The sample-based estimate was also included as a naive reference.
```{r n2o_means_national, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_survey_ests.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>%
  summarise(mean_n2o = mean(n2o)) %>%
  summarise(estimate = round(median(mean_n2o), 2), # posterior median
    LCL = round(quantile(mean_n2o, probs = 0.025), 2),
    UCL = round(quantile(mean_n2o, probs = 0.975), 2)) %>% 
  mutate(type = "model") %>%
  bind_rows(cbind(n2o_survey_ests[10, 2:4], type = rep("survey", 1))) %>%
  add_row(estimate = round(mean(df_model$n2o), 2),
          type = "sample") %>%
  print()
```

The black, vertical, dashed line in the figure below represents the mean of the sample. 
```{r plot_n2o_means_national, echo=FALSE, fig.align='center', fig.height=2, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_survey_ests.rda")

all_predictions %>%
  group_by(.draw) %>%
  summarise(mean_n2o = mean(n2o)) %>%
  summarise(estimate = round(median(mean_n2o), 2), # posterior median
    LCL = round(quantile(mean_n2o, probs = 0.025), 2),
    UCL = round(quantile(mean_n2o, probs = 0.975), 2)) %>% 
  mutate(type = "model") %>%
  bind_rows(cbind(n2o_survey_ests[10, 2:4], type = rep("survey", 1))) %>%
  mutate(cl_width = round(UCL - LCL, 2)) %>%
  ggplot(aes(x = type, y = estimate, color = type)) +
  geom_point(size = 2, position=position_dodge(width=0.5)) +
  geom_linerange(aes(ymin = LCL, ymax = UCL) , position=position_dodge(width=0.5)) +
  scale_colour_manual(values = c("black", "grey")) +
  geom_hline(yintercept = round(mean(df_model$n2o), 2), 
             linetype = "dashed",
             color = "black") +
  ylab("mean N2O concentration") +
  ggtitle("National estimate comparison") +
  coord_flip() + 
  theme_tidybayes()
```

Below, estimates were compared by ecoregion.
```{r n2o_mean_wsa9, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_survey_ests.rda")

all_predictions %>%
  group_by(WSA9, .draw) %>%
  summarise(mean_n2o = mean(n2o)) %>%
  group_by(WSA9, .groups = "drop") %>%
  summarise(estimate = round(median(mean_n2o), 2),
    LCL = round(quantile(mean_n2o, probs = 0.025), 2),
    UCL = round(quantile(mean_n2o, probs = 0.975), 2),
    .groups = "drop") %>% 
  mutate(ecoregion = factor(WSA9)) %>%
  mutate(type = "model") %>%
  select(ecoregion, estimate, LCL, UCL, type) %>%
  mutate(ecoregion = forcats::fct_reorder(ecoregion, estimate)) %>%
  bind_rows(cbind(n2o_survey_ests[-10,], type = rep("survey", 9))) %>%
  arrange(ecoregion) %>%
  print()
```

```{r plot_mean_n2o_wsa9, echo=FALSE, fig.align='center', fig.height=4, fig.width=6, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_survey_ests.rda")

all_predictions %>%
  group_by(WSA9, .draw) %>%
  summarise(mean_n2o = mean(n2o), .groups = "drop") %>%
  group_by(WSA9) %>%
  summarise(estimate = round(median(mean_n2o), 2),
    LCL = round(quantile(mean_n2o, probs = 0.025), 2),
    UCL = round(quantile(mean_n2o, probs = 0.975), 2),
    .groups = "drop") %>% 
  mutate(ecoregion = factor(WSA9)) %>%
  mutate(type = "model") %>%
  select(ecoregion, estimate, LCL, UCL, type) %>%
  mutate(ecoregion = forcats::fct_reorder(ecoregion, estimate)) %>%
  bind_rows(cbind(n2o_survey_ests[-10,], type = rep("survey", 9))) %>%
  ggplot(aes(x = ecoregion, y = estimate, group = type, color = type)) +
  geom_point(size = 2, position=position_dodge(width=0.5)) +
  geom_linerange(aes(ymin = LCL, ymax = UCL) , position=position_dodge(width=0.5)) +
  scale_colour_manual(values = c("black", "grey")) +
  coord_flip() + 
  theme_tidybayes() +
  ylab("mean dissolved N2O") +
  ggtitle("Ecoregion estimates comparison")
```

Means were compared according to size categories below.
```{r table_size_mean_n2o, echo=FALSE, fig.align='center', fig.height=4, fig.width=6, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_survey_ests_size.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(mean_n2o = mean(n2o), .groups = "drop") %>%
  group_by(size_cat) %>%
  summarise(estimate = round(median(mean_n2o), 1),
    LCL = round(quantile(mean_n2o, probs = 0.025), 1),
    UCL = round(quantile(mean_n2o, probs = 0.975), 1),
    .groups = "drop") %>%
  mutate(type = "model") %>%
  bind_rows(cbind(n2o_survey_ests_size, type = rep("survey", 5))) %>%
  mutate(size = factor(size_cat)) %>%
  mutate(size = forcats::fct_reorder(size, estimate)) %>%
  mutate(cl_width = UCL - LCL) %>%
  arrange(size) %>%
  select(size, estimate, LCL, UCL, type) %>% 
  print()
```

```{r plot_size_mean_n2o, echo=FALSE, fig.align='center', fig.height=4, fig.width=6, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_survey_ests_size.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(mean_n2o = mean(n2o), .groups = "drop") %>%
  group_by(size_cat) %>%
  summarise(estimate = round(median(mean_n2o), 2),
    LCL = round(quantile(mean_n2o, probs = 0.025), 2),
    UCL = round(quantile(mean_n2o, probs = 0.975), 2),
    .groups = "drop") %>% 
  #mutate(size_cat = factor(size_cat)) %>%
  mutate(type = "model") %>%
  select(size_cat, estimate, LCL, UCL, type) %>%
  bind_rows(cbind(n2o_survey_ests_size, type = rep("survey", 5))) %>%
  mutate(size_cat = forcats::fct_reorder(size_cat, estimate)) %>%
  ggplot(aes(x = size_cat, y = estimate, group = type, color = type)) +
  geom_point(size = 2, position=position_dodge(width=0.5)) +
  geom_linerange(aes(ymin = LCL, ymax = UCL) , position=position_dodge(width=0.5)) +
  scale_colour_manual(values = c("black", "grey")) +
  coord_flip() + 
  theme_tidybayes() +
  ylab("mean dissolved N2O") +
  ggtitle("Size category estimates comparison")
```

### Saturation
Below, the same comparisons were made for the saturation estimates.
```{r table_nat_sat_mean, message=FALSE, warning=FALSE, fig.align='center', fig.height=6, fig.width=8}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sat_survey_ests.rda")

all_predictions %>%
  group_by(.draw) %>%
  summarise(mean_sat = mean(n2osat), .groups = "drop") %>%
  summarise(estimate = round(median(mean_sat), 3),
    LCL = round(quantile(mean_sat, probs = 0.025), 3),
    UCL = round(quantile(mean_sat, probs = 0.975), 3),
    .groups = "drop") %>% 
  mutate(type = "model") %>%
  bind_rows(cbind(sat_survey_ests[10, 2:4], type = rep("survey", 1))) %>%
  add_row(estimate = round(mean(df_model$n2o / df_model$n2o_eq), 3),
          type = "sample") %>%
  print()
```

```{r plot_nat_sat_mean, echo=FALSE, fig.align='center', fig.height=2, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sat_survey_ests.rda")

all_predictions %>%
  group_by(.draw) %>%
  summarise(mean_sat = mean(n2osat)) %>%
  summarise(estimate = round(median(mean_sat), 3),
    LCL = round(quantile(mean_sat, probs = 0.025), 3),
    UCL = round(quantile(mean_sat, probs = 0.975), 3)) %>%  
  mutate(type = "model") %>%
  bind_rows(cbind(sat_survey_ests[10, 2:4], type = rep("survey", 1))) %>%
  ggplot(aes(x = type, y = estimate, color = type)) +
  geom_point(size = 2, position=position_dodge(width=0.5)) +
  geom_linerange(aes(ymin = LCL, ymax = UCL) , position=position_dodge(width=0.5)) +
  scale_colour_manual(values = c("black", "grey")) +
  geom_hline(yintercept = round(mean(df_model$n2o / df_model$n2o_eq), 3), 
             linetype = "dashed",
             color = "black") +
  ylab("mean N2O saturation ratio") +
  ggtitle("National estimates comparison") +
  coord_flip() + 
  theme_tidybayes()
```
```{r plot_wsa9_sat_mean, echo=FALSE, fig.align='center', fig.height=4, fig.width=6, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sat_survey_ests.rda")

all_predictions %>%
  group_by(WSA9, .draw) %>%
  summarise( mean_sat = mean(n2osat), .groups = "drop") %>%
  group_by(WSA9) %>%
  summarise( estimate = round(median(mean_sat), 3),
    LCL = round(quantile(mean_sat, probs = 0.025), 3),
    UCL = round(quantile(mean_sat, probs = 0.975), 3),
    .groups = "drop") %>% 
  mutate(ecoregion = factor(WSA9)) %>%
  mutate(type = "model") %>%
  mutate(ecoregion = forcats::fct_reorder(ecoregion, estimate)) %>%
  select(ecoregion, estimate, LCL, UCL, type) %>%
  bind_rows(cbind(sat_survey_ests[-10,], type = rep("survey", 9))) %>%
  mutate(cl_width = UCL - LCL) %>%
  ggplot(aes(x = ecoregion, y = estimate, group = type, color = type)) +
  geom_point(size = 2, position=position_dodge(width=0.5)) +
  geom_linerange(aes(ymin = LCL, ymax = UCL) , position=position_dodge(width=0.5)) +
  scale_colour_manual(values = c("black", "grey")) +
  geom_hline(yintercept = 1, color = "lightgrey") +
  ylab("mean N2O saturation ratio") +
  ggtitle("Ecoregion estimates comparison") +
  coord_flip() + 
  theme_tidybayes()
```

```{r table_size_sat_mean, echo=FALSE, fig.align='center', fig.height=4, fig.width=6, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sat_survey_ests_size.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(mean_sat = mean(n2osat), .groups = "drop") %>%
  group_by(size_cat) %>%
  summarise(estimate = round(median(mean_sat), 3),
    LCL = round(quantile(mean_sat, probs = 0.025), 3),
    UCL = round(quantile(mean_sat, probs = 0.975), 3)) %>%
  mutate(type = "model") %>%
  bind_rows(cbind(sat_survey_ests_size, type = rep("survey", 5))) %>%
  mutate(size = factor(size_cat)) %>%
  mutate(cl_width = UCL - LCL) %>%
  arrange(size) %>%
  select(size, estimate, LCL, UCL, type) %>% 
  print()
```

```{r plot_size_sat_mean, echo=FALSE, fig.align='center', fig.height=4, fig.width=6, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(mean_sat = mean(n2osat), .groups = "drop") %>%
  group_by(size_cat) %>%
  summarise(estimate = round(median(mean_sat), 3),
    LCL = round(quantile(mean_sat, probs = 0.025), 3),
    UCL = round(quantile(mean_sat, probs = 0.975), 3),
    .groups = "drop") %>%
  mutate(type = "model") %>%
  bind_rows(cbind(sat_survey_ests_size, type = rep("survey", 5))) %>%
  mutate(size = factor(size_cat)) %>%
  select(size, estimate, LCL, UCL, type) %>% 
  ggplot(aes(x = size, y = estimate, group = type, color = type)) +
  geom_point(position=position_dodge(width=0.5)) +
  geom_linerange(aes(ymin = LCL, ymax = UCL) , position=position_dodge(width=0.5)) +
  scale_colour_manual(values = c("black", "grey")) +
  ylab("mean N2O saturation ratio") +
  ggtitle("Size category estimates comparison") +
  coord_flip() + 
  theme_tidybayes()
```

# References

# Session Info
```{r session}
sessionInfo()
```